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TVfETHODS AND COMPOSITIONS FOR THE RESPONSE PREDICTIO N OF MALIG- 
NANT NEOPLASIA TO TREATMENT 

TECHNICAL FIELD OF THE INVENTION 

The invention relates to methods and compositions for the prediction, diagnosis, prognosis, 
5 prevention and treatment of neoplastic disease. Of particular interest is the response prediction of 
neoplastic lesions to various therapeutic regimens. Neoplastic disease is often caused by 
chromosomal rearrangements which lead to over- or underexpression of the rearranged genes. The 
invention discloses genes which are overexpressed in neoplastic tissue and are useful as diagnostic 
markers and targets for treatment. Methods are disclosed for predicting, diagnosing and 
10 prognosing as well as preventing and treating neoplastic disease. 

BACKGROUND OF THE INVENTION 

Chromosomal aberrations (amplifications, deletions, inversions, insertions, translocations and/or 
viral integrations) are of importance for the development of cancer and neoplastic lesions, as they 
account for deregulations of the respective regions. Amplifications of genomic regions have been 

15 described, in which genes of importance for growth characteristics, differentiation, invasiveness or 
resistance to therapeutic intervention are located. One of those regions with chromosomal 
aberrations is the region carrying the HER-2/neu gene which is amplified in breast cancer patients, 
hi approximately 25% of breast cancer patients the HER-2/neu gene is overexpressed due to gene 
amplification. HER-2/neu overexpression correlates with a poor prognosis (relapse, overall 

20 survival, sensitivity to therapeutics). The importance of HER-2/neu for the prognosis of the disease 
progression has been described [Gusterson et al, 1992, (1)]. Gene specific antibodies raised 
against HER-2/neu (Herceptin™) have been generated to treat the respective cancer patients. 
However, only about 50% of the patients benefit from the antibody treatment with Herceptin™, 
which is most often combined with chemotherapeutic regimen. The discrepancy of HER-2/neu 

25 positive tumors (overexpressing HER-2/neu to similar extent) with regard to responsiveness to 
therapeutic intervention suggest, that there might be additional factors or genes being involved in 
growth and apoptotic characteristics of the respective tumor tissues. There seems to be no 
monocausal relationship between overexpression of the growth factor receptor HER-2/neu and 
therapy outcome. In line with this the measurement of commonly used tumor markers such as 

30 estrogen receptor, progesterone receptor, p53 and Ki-67 do provide only very limited information 
on clinical outcome of specific therapeutic decisions. Therefore there is a great need for a more 
detailed diagnostic and prognostic classification of tumors to enable improved therapy decisions 
and prediction of survival of the patients. The present invention addresses the need for additional 
. markers by providing genes, which expression is deregulated in tumors and correlates with clinical 
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outcome. One focus is the deregulation of genes present in specific chromosomal regions and their 
interaction in disease development and drug responsiveness. 

HER-2/neu and other markers for neoplastic disease are commonly assayed with diagnostic 
methods such as immunohistochemistry (THC) (e.g. HercepTest™ from DAKO Inc.) and 
5 Fluorescence-In-Situ-Hybridization (FISH) (e.g. quantitative measurement of the HER-2/neu and 
Topoisomerase II alpha with a fluorescence-w-szte-Hybridization kit from VYSIS). Additionally 
HER-2/neu can be assayed by detecting HER-2/neu fragments in serum with an ELISA test 
(BAYER Corp.) or a with a quantitative PCR kit which compares the amount of HER-2/neu gene 
with the amount of a non-amplified control gene in order to detect HER-2/neu gene amplifications 

10 (ROCHE). These methods, however, exhibit multiple disadvantages with regard to sensitivity, 
specificity, technical and personnel efforts, costs, time consumption, inter-lab reproducibility. 
These methods are also restricted with regard to measurement of multiple parameters within one 
patient sample ("multiplexing"). Usually only about 3 to 4 parameters (e.g. genes or gene 
products) can be detected per tissue slide. Therefore, there is a need to develop a fast and simple 

15 test to measure simultaneously multiple parameters in one sample. The present invention addresses 
the need for a fast and simple high-resolution method, that is able to detect multiple diagnostic and 
prognostic markers simultaneously. 

SUMMARY OF THE INVENTION 

The present invention is based on discovery that chromosomal alterations in cancer tissues can 
20 lead to changes in the expression of genes that are encoded by the altered chromosomal regions. 
Exemplary 43 human genes have been identified that are co-amplified in neoplastic lesions from 
breast cancer tissue resulting in altered expression of several of these genes (Tables 1 to 4). These 
43 genes are differentially expressed in breast cancer states, relative to their expression in normal, 
or non-breast cancer states. The present invention relates to derivatives, fragments, analogues and 
25 homologues of these genes and uses or methods of using of the same. 

The present invention further relates to novel preventive, predictive, diagnostic, prognostic and 
therapeutic compositions and uses for malignant neoplasia and breast cancer in particular. 
Especially membrane bound marker gene products containing extracellular domains can be a 
particularly usefiil target for treatment methods as well as diagnostic and clinical monitoring 
30 methods. 

It is a discovery of the present invention that several of these genes are characterized in that their 
gene products functionally interact in signaling cascades or by directly or indirectly influencing 
each other. This interaction is important for the normal physiology of certain non-neoplastic 
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tissues (e.g. brain or neurogenic tissue). The deregulation of these genes in neoplastic lesions 
where they are normally exhibit of different level of activity or are not active, however, results in 
pathophysiology and affects the characteristics of the disease-associated tissue. 

The present invention further relates to methods for detecting these deregulations in malignant 
5 neoplasia on DNA and mRNA level. 

The present invention further relates to a method for the detection of chromosomal alterations 
characterized in that the relative abundance of individual mRNAs, encoded by genes, located in 
altered chromosomal regions is detected. 

The present invention further relates to a method for the detection of the flanking breakpoints of 
10 named chromosomal alterations by measurement of DNA copy number by quantitative PCR or 
DNA-Arrays and DNA sequencing. 

A method for the prediction, diagnosis or prognosis of malignant neoplasia by the detection of 
DNA sequences flanking named genomic breakpoint or are located within such. 

The present invention further relates to a method for the detection of chromosomal alterations 
15 characterized in that the copy number of one or more genomic nucleic acid sequences located 
within an altered chromosomal region(s) is detected by quantitative PCR techniques (e.g. 
TaqMan™, Lightcycler™ and iCycler™). 

The present invention further relates to a method for the prediction, diagnosis or prognosis of 
malignant neoplasia by the detection of at least 2 markers whereby the markers are genes and 
20 fragments thereof or genomic nucleic acid sequences that are located on one chromosomal region 
which is altered in malignant neoplasia and breast cancer in particular. 

The present invention also discloses a method for the prediction, diagnosis or prognosis of 
malignant neoplasia by the detection of at least 2 markers whereby the markers are located on one 
or more chromosomal region(s) which is/are altered in malignant neoplasia; and the markers 
25 interact as (i) receptor and ligand or (ii) members of the same signal transduction pathway or 
(iii)members of synergistic signal transduction pathways or (iv) members of antagonistic signal 
transduction pathways or (v) transcription factor and transcription factor binding site. 

Also dislcosed is a method for the prediction, diagnosis or prognosis of malignant neoplasia by the 
detection of at least one marker whereby the marker is a VNTR, SNP, RFLP or STS which is 
30 located on one chromosomal region which is altered in malignant neoplasia due to amplification 
and the marker is detected in (a) a cancerous and (b) a non cancerous tissue or biological sample 
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from the same individual. A preferred embodiment is the detection of at least one VNTR marker of 
Table 6 or at least on SNP marker of Table 4 or combinations thereof.. Even more preferred can 
the detection, quantification and sizing of such polymorphic markers be achieved by methods of 
(a) for the comparative measurement of amount and size by PCR amplification and subsequent 
5 capillary electrophoresis, (b) for sequence determination and allelic discrimination by gel electro- 
phoresis (e.g. SSCP, DGGE), real time kinetic PCR, direct DNA sequencing, pyro-sequencing, 
mass-specific allelic discrimination or resequencing by DNA array technologies, (c) for the 
dertermination of specific restriction patterns and subsequent electrophoretic separation and (d) for 
allelic discrimination by allel specific PCR (e.g. ASO). An even more favorable detection of a 
10 hetrozygous VNTR, SNP, RFLP or STS is done in a multiplex fashion, utilizing a variety of 
labeled primers (e.g. fluorescent, radioactive, bioactive) and a suitable capillary electrophoresis 
(CE) detection system. 

In another embodiment the expression of these genes can be detected with DNA-arrays as 
described in W09727317 and US6379895. 

15 In a further embodiment the expression of these genes can be detected with bead based direct 
flourescent readout techniques such as described in W097 14028 and WO9952708. 

In one embodiment, the invention pertains to a method of determining the phenotype of a cell or 
tissue, comprising detecting the differential expression, relative to a normal or untreated cell, of at 
least one polynucleotide comprising SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19 or 21 to 26 or 53 to 
20 75, wherein the polynucleotide is differentially expressed by at least about 1 .5 fold, at least about 2 
fold or at least about 3 fold. 

In a further aspect the invention pertains to a method of determining the phenotype of a cell or 
tissue, comprising detecting the differential expression, relative to a normal or untreated cell, of at 
least one polynucleotide which hybridizes under stringent conditions to one of the polynucleotides 
25 of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19 or 21 to 26 or 53 to 75 and encodes a polypeptide 
exhibiting the same biological function as given in Table 2 or 3 for the respective polynucleotide, 
wherein the polynucleotide is differentially expressed by at least at least about 1.5 fold , at least 
about 2 fold or at least about 3 fold. 

In another embodiment of the invention a polynucleotide comprising a polynucleotide selected 
30 from SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19 or 21 to 26 and 53 to 75 or encoding one of the 
polypeptides with SEQ ID NO: 28 to 32, 34, 35, 37 to 42, 44, 45 or 47 to 52 or 76 to 98 can be 
used to identify cells or tissue in individuals which exhibit a phenotype predisposed to breast 
cancer or a diseased phenotype, thereby (a) predicting whether an individual is at risk for the 
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development, or (b) diagnosing whether an individual is having, or (c) prognosing the progression 
or the outcome of fee treatment malignant neoplasia and breast cancer in particular. 

In yet another embodiment the invention provides a method for identifying genomic regions which 
are altered on the chromosomal level and encode genes that are linked by function and are 
5 differentially expressed in malignant neoplasia and breast cancer in particular. 

In yet another embodiment the invention provides the genomic regions 17q21, 3p21 and 12ql3 for 
use in prediction, diagnosis and prognosis as well as prevention and treatment of malignant 
neoplasia and breast cancer. In particular not only the intragenic regions, but also intergenic 
regions, pseudogenes or non-transcribed genes of said chromosomal regions can be used for 
10 diagnostic, predictive, prognostic and preventive and therapeutic compositions and methods. 
Therefore sequences of coding or non-coding regions as depicted in this invention are offered by 
way of illustration and not by way of limitation. As one aspect of this, genomic sequences in 
between the genomic sequences depicted can be used for similiar purposes. 

In yet another embodiment the invention provides methods of screening for agents which regulate 
the activity of a polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 
to 98 or encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 
26 and 53 to 75. A test compound is contacted with a polypeptide comprising a polypeptide 
selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a 
polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75. Binding of the test compound to 
the polypeptide is detected. A test compound which binds to the polypeptide is thereby identified 
as a potential therapeutic agent for the treatment of malignant neoplasia and more particularly 
breast cancer. 

In even another embodiment the invention provides another method of screening for agents which 
regulate the activity of a polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 
25 and 76 to 98 or encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID 
NO: 1 to 26 and 53 to 75. A test compound is contacted with a polypeptide comprising a 
polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide 
comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75. A biological activity 
mediated by the polypeptide is detected. A test compound which decreases the biological activity 
30 is thereby identified as a potential therapeutic agent for decreasing the activity of the polypeptide 
encoded by a polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 
98 or encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 
and 53 to 75 in malignant neoplasia and breast cancer in particular. A test compound which 
increases the biological activity is thereby identified as a potential therapeutic agent for increasing 
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the activity of the polypeptide encoded by a polypeptide selected from one of the polypeptides 
with SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a 
polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 in malignant neoplasia and breast 
cancer in particular. 

5 In another embodiment the invention provides a method of screening for agents which regulate the 
activity of a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 
to 75. A test compound is contacted with a polynucleotide comprising a polynucleotide selected 
from SEQ ID NO: 1 to 26 and 53 to 75. Binding of the test compound to the polynucleotide 
comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 is detected. A test 
10 compound which binds to the polynucleotide is thereby identified as a potential therapeutic agent 
for regulating the activity of a polynucleotide comprising a polynucleotide selected from SEQ ID 
NO: 1 to 26 and 53 to 75 in malignant neoplasia and breast cancer in particular. 

The invention thus provides polypeptides selected from one of the polypeptides with SEQ ID NO: 
27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a polynucleotide selected from 

15 SEQ 3D NO: 1 to 26 and 53 to 75 which can be used to identify compounds which may act, for 
example, as regulators or modulators such as agonists and antagonists, partial agonists, inverse 
agonists, activators, co-activators and inhibitors of the polypeptide comprising a polypeptide 
selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a 
polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75. Accordingly, the invention 

20 provides reagents and methods for regulating a polypeptide comprising a polypeptide selected 
from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a 
polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 in malignant neoplasia and more 
particularly breast cancer. The regulation can be an up- or down regulation. Reagents that 
modulate the expression, stability or amount of a polynucleotide comprising a polynucleotide 

25 selected from SEQ ID NO: 1 to 26 and 53 to 75 or the activity of the polypeptide comprising a 
polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide 
comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 can be a protein, a 
peptide, a peptidomimetic, a nucleic acid, a nucleic acid analogue (e.g. peptide nucleic acid, locked 
nucleic acid) or a small molecule. Methods that modulate the expression, stability or amount of a 

30 polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or the 
activity of the polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 
98 or encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 
and 53 to 75 can be gene replacement therapies, antisense, ribozyme and triplex nucleic acid 
approaches. 
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In one embodiment of the invention provides antibodies which specifically bind to a full-length or 
partial polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or 
encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 
53 to 75 or a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 
5 53 to 75 for use in prediction, prevention, diagnosis, prognosis and treatment of malignant 
neoplasia and breast cancer in particular. 

Yet another embodiment of the invention is the use of a reagent which specifically binds to a 
polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or a 
polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded 
10 by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 
in the preparation of a medicament for the treatment of malignant neoplasia and breast cancer in 
particular. 

Still another embodiment is the use of a reagent that modulates the activity or stability of a 
polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded 
15 by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 
or the expression, amount or stability of a polynucleotide comprising a polynucleotide selected 
from SEQ ID NO: 1 to 26 and 53 to 75 in the preparation of a medicament for the treatment of 
malignant neoplasia and breast cancer in particular. 

Still another embodiment of the invention is a pharmaceutical composition which includes a 
20 reagent which specifically binds to a polynucleotide comprising a polynucleotide selected from 
SEQ ID NO: 1 to 26 and 53 to 75 or a polypeptide comprising a polypeptide selected from SEQ ID 
NO: 27 to 52 and 76 to 98 or encoded by a polynucleotide comprising a polynucleotide selected 
from SEQ ID NO: 1 to 26 and 53 to 75, and a pharmaceutically acceptable carrier. 

Yet another embodiment of the invention is a pharmaceutical composition including a 
25 polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or 
encoding a polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 
98. 

In one embodiment, a reagent which alters the level of expression in a cell of a polynucleotide 
comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or encoding a 
30 polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98, or a 
sequence complementary thereto, is identified by providing a cell, treating the cell with a test 
reagent, determining the level of expression in the cell of a polynucleotide comprising a 
polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or encoding a polypeptide 
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comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or a sequence comple- 
mentary thereto, and comparing the level of expression of the polynucleotide in the treated cell 
with the level of expression of the polynucleotide in an untreated cell, wherein a change in the 
level of expression of the polynucleotide in the treated cell relative to the level of expression of the 
5 polynucleotide in the untreated cell is indicative of an agent which alters the level of expression of 
the polynucleotide in a cell. 

The invention further provides a pharmaceutical composition comprising a reagent identified hy 
this method. 

Another embodiment of the invention is a pharmaceutical composition which includes a 
10 polypeptide comprising a polypeptide selected from SEQ ID NO: 27 to 52 and 76 to 98 or which is 
encoded by a polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 
53 to 75. 

A further embodiment of the invention is a pharmaceutical composition comprising a 
polynucleotide including a sequence which hybridizes under stringent conditions to a 

15 polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 and 
encoding a polypeptide exhibiting the same biological function as given for the respective 
polynucleotide in Table 2 or 3, or encoding a polypeptide comprising a polypeptide selected from 
SEQ ID NO: 27 to 52 and 76 to 98. Pharmaceutical compositions, useful in the present invention 
may further include fusion proteins comprising a polypeptide comprising a polynucleotide selected 

20 from SEQ ID NO: 27 to 52 and 76 to 98, or a fragment thereof, antibodies, or antibody fragments 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a sketch of the chromosome 17 with G-banding pattern and cytogenetic positions. In 
the blow out at the lower part of the figure a detailed view of the chromosomal area of the 
long arm of chromosome 17 (17ql2-21.1) is provided. Each vertical rectangle depicted in 
25 medium gray, represents a gene as labeled below or above the individual position. The 

order of genes depicted in this graph has been deduced from experiments questioning the 
amplification an over expression and from public available data (e.g. UCSC, NCBI or 
Ensemble). 

Fig. 2 shows the same region as depicted before in Fig. 1 and a cluster representation of the 
30 individual expression values measured by DNA-chip hybridization. The gene representing 

squares are indicated by a dotted line. In the upper part of the cluster representation 4 
tumor cell lines, of which two harbor a known HER-2/neu over expression (SKBR3 and 
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AU565), are depicted with their individual expression profiles. Not only the HER-2/neu 
gene shows a clear over expression but as provided by this invention several other genes 
with in the surrounding. In the middle part of the cluster representation expression data 
obtained from immune histochemically characterized tumor samples are presented. Two of 
5 the depicted probes show a significant over expression of genes marked by the white 

rectangles. For additional information and comparison expression profiles of several non 
diseased human tissues (rnas obtained from Clontech Inc.) Are provided. Closest relation 
to the expression profile of HER-2/neu positive tumors displays human brain and neural 
tissue. 

10 Fig. 3 provides data from DNA amplification measurements by qpcr (e.g. Taqman). Data 
indicates that in several analyzed breast cancer cell lines harbor amplification of genes 
which were located in the previously described region (ARCHEON). Data were displayed 
for each gene on the x-axis and 40-Ct at the y-axis. Data were normalized to the expression 
level of GAPDH as seen in the first group of columns. 

15 Fig. 4 represents a graphical overview on the amplified regions and provides information on the 
length of the individual amplification and over expression in the analyzed tumor cell lines. 
The length of the amplification and the composition of genes has a significant impact on 
the nature of the cancer cell and on the responsiveness on certain drugs, as described 
elsewhere. 

20 DETAILED DESCRIPTION OF THE INVENTION 
DEFINITIONS 

"Differential expression", as used herein, refers to both quantitative as well as qualitative 
differences in the genes 1 expression patterns depending on differential development and/or tumor 
growth. Differentially expressed genes may represent "marker genes," and/or "target genes". The 
25 expression pattern of a differentially expressed gene disclosed herein may be utilized as part of a 
prognostic or diagnostic breast cancer evaluation. Alternatively, a differentially expressed gene 
disclosed herein may be used in methods for identifying reagents and compounds and uses of these 
reagents and compounds for the treatment of breast cancer as well as methods of treatment. 

"Biological activity" or "bioactivity" or "activity" or "biological function", which are used 
30 interchangeably, herein mean an effector or antigenic function that is directly or indirectly 
performed by a polypeptide (whether in its native or denatured conformation), or by any fragment 
thereof in vivo or in vitro. Biological activities include but are not limited to binding to 
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polypeptides, binding to other proteins or molecules, enzymatic activity, signal transduction, 
activity as a DNA binding protein, as a transcription regulator, ability to bind damaged DNA, etc. 
A bioactivity can be modulated by directly affecting the subject polypeptide. Alternatively, a 
bioactivity can be altered by modulating the level of the polypeptide, such as by modulating 
5 expression of the corresponding gene. 

The term "marker" or "biomarker" refers a biological molecule, e.g., a nucleic acid, peptide, 
hormone, etc., whose presence or concentration can be detected and correlated with a known 
condition, such as a disease state. 

"Marker gene," as used herein, refers to a differentially expressed gene which expression pattern 
10 may be utilized as part of predictive, prognostic or diagnostic malignant neoplasia or breast cancer 
evaluation, or which, alternatively, may be used in methods for identifying compounds useful for 
the treatment or prevention of malignant neoplasia and breast cancer in particular. A marker gene 
may also have the characteristics of a target gene. 

"Target gene", as used herein, refers to a differentially expressed gene involved in breast cancer in 
15 a manner by which modulation of the level of target gene expression or of target gene product 
activity may act to ameliorate symptoms of malignant neoplasia and breast cancer in particular. A 
target gene may also have the characteristics of a marker gene. 

The term biological sample", as used herein, refers to a sample obtained from an organism or 
from components (e.g., cells) of an organism. The sample may be of any biological tissue or fluid. 
• 20 Frequently the sample will be a "clinical sample" which is a sample derived from a patient. Such 
samples include, but are not limited to, sputum, blood, blood cells (e.g., white cells), tissue or fine 
needle biopsy samples, cell-containing bodyfluids, free floating nucleic acids, urine, peritoneal 
fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues 
such as frozen sections taken for histological purposes. 

25 By "array" or "matrix" is meant an arrangement of addressable locations or "addresses" on a 
device. The locations can be arranged in two dimensional arrays, three dimensional arrays, or other 
matrix formats. The number of locations can range from several to at least hundreds of thousands. 
Most importantly, each location represents a totally independent reaction site. Arrays include but 
are not limited to nucleic acid arrays, protein arrays and antibody arrays. A "nucleic acid array" 

30 refers to an array containing nucleic acid probes, such as oligonucleotides, polynucleotides or 
larger portions of genes. The nucleic acid on the array is preferably single stranded. Arrays 
wherein the probes are oligonucleotides are referred to as "oligonucleotide arrays" or 
"oligonucleotide chips." A "microarray," herein also refers to a "biochip" or "biological chip", an 
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array of regions having a density of discrete regions of at least about 100/cm 2 , and preferably at 
least about 1000/cm 2 . The regions in a microarray have typical dimensions, e.g., diameters, in the 
range of between about 10-250 /im, and are separated from other regions in the array by about the 
same distance. A "protein array" refers to an array containing polypeptide probes or protein probes 
5 which can be in native form or denatured. An "antibody array" refers to an array containing 
antibodies which include but are not limited to monoclonal antibodies (e.g. from a mouse), 
chimeric antibodies, humanized antibodies or phage antibodies and single chain antibodies as well 
as fragments from antibodies. 

The term "agonist", as used herein, is meant to refer to an agent that mimics or upregulates (e.g., 
10 potentiates or supplements) the bioactivity of a protein. An agonist can be a wild-type protein or 
derivative thereof having at least one bioactivity of the wild-type protein. An agonist can also be a 
compound that upregulates expression of a gene or which increases at least one bioactivity of a 
protein. An agonist can also be a compound which increases the interaction of a polypeptide with 
another molecule, e.g., a target peptide or nucleic acid. 

15 The term "antagonist" as used herein is meant to refer to an agent that downregulates (e.g., 
suppresses or inhibits) at least one bioactivity of a protein. An antagonist can be a compound 
which inhibits or decreases the interaction between a protein and another molecule, e.g., a target 
peptide, a ligand or an enzyme substrate. An antagonist can also be a compound that 
downregulates expression of a gene or which reduces the amount of expressed protein present. 

20 "Small molecule" as used herein, is meant to refer to a composition, which has a molecular weight 
of less than about 5 kD and most preferably less than about 4 kD. Small molecules can be nucleic 
acids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids or other organic (carbon- 
containing) or inorganic molecules. Many pharmaceutical companies have extensive libraries of 
chemical and/or biological mixtures, often fungal, bacterial, or algal extracts, which can be 

25 screened with any of the assays of the invention to identify compounds that modulate a bioactivity. 

The terms "modulated" or "modulation" or "regulated" or "regulation" and "differentially 
regulated" as used herein refer to both upregulation (i.e., activation or stimulation (e.g., by 
agonizing or potentiating) and down regulation [i.e., inhibition or suppression (e.g., by 
antagonizing, decreasing or inhibiting)]. 

30 "Transcriptional regulatory unit" refers to DNA sequences, such as initiation signals, enhancers, 
and promoters, which induce or control transcription of protein coding sequences with which they 
are operably linked. In preferred embodiments, transcription of one of the genes is under the 
control of a promoter sequence (or other transcriptional regulatory sequence) which controls the 
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expression of the recombinant gene in a cell-type in which expression is intended. It will also be 
understood that the recombinant gene can be under the control of transcriptional regulatory 
sequences which are the same or which are different from those sequences which control 
transcription of the naturally occurring forms of the polypeptide. 

5 The term "derivative" refers to the chemical modification of a polypeptide sequence, or a 
polynucleotide sequence. Chemical modifications of a polynucleotide sequence can include, for 
example, replacement of hydrogen by an alkyl, acyi, or amino group. A derivative polynucleotide 
encodes a polypeptide which retains at least one biological or immunological function of the 
natural molecule. A derivative polypeptide is one modified by glycosylation, pegylation, or any 
10 similar process that retains at least one biological or immunological function of the polypeptide 
from which it was derived. 

The term "nucleotide analog" refers to oligomers or polymers being at least in one feature different 
from naturally occurring nucleotides, oligonucleotides or polynucleotides, but exhibiting 
functional features of the respective naturally occurring nucleotides (e.g. base paring, 
15 hybridization, coding information) and that can be used for said compositions. The nucleotide 
analogs can consist of non-naturally occurring bases or polymer backbones, examples of which are 
LNAs, PNAs and Morpholinos. The nucleotide analog has at least one molecule different from its 
naturally occurring counterpart or equivalent. 

"BREAST CANCER GENES" or "BREAST CANCER GENE" as used herein refers to the 
20 polynucleotides of SEQ ID NO: 1 to 26 and 53 to 75, as well as derivatives, fragments, analogs 
and homologues thereof, the polypeptides encoded thereby, the polypeptides of SEQ ID NO: 27 to 
52 and 76 to 98 as well as derivatives, fragments, analogs and homologues thereof and the 
corresponding genomic transcription units which can be derived or identified with standard 
techniques well known in the art using the information disclosed in Tables 1 to 5 and Figures 1 to 
25 4. The GenBank, Locuslink ID and the UniGene accession numbers of the polynucleotide 
sequences of the SEQ ID NO: 1 to 26 and 53 to 75 and the polypeptides of the SEQ ID NO: 27 to 
52 and 76 to 98 are shown in Table 1, the gene description, gene function and subcellular 
localization is given in Tables 2 and 3. 

The term "chromosomal region" as used herein refers to a consecutive DNA stretch on a 
30 chromosome which can be defined by cytogenetic or other genetic markers such as e.g. restriction 
length polymorphisms (RFLPs), single nucleotide polymorphisms (SNPs), expressed sequence tags 
(ESTs), sequence tagged sites (STSs), microsatellites, variable number of tandem repeats (VNTRs) 
and genes. Typically a chromosomal region consists of up to 2 Megabases (MB), up to 4 MB, up 
to 6 MB, up to 8 MB, up to 10 MB, up to 20 MB or even more MB. 
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The term "altered chromosomal region" or" abberant chromosomal region" refers to a structural 
change of the chromosomal composition and DNA sequence, which can occur by the following 
events: amplifications, deletions, inversions, insertions, translocations and/or viral integrations. A 
trisomy, where a given cell harbors more than two copies of a chromosome, is within the meaning 
5 of the term "amplification" of a chromosome or chromosomal region. 

The present invention provides polynucleotide sequences and proteins encoded thereby, as well as 
probes derived from the polynucleotide sequences, antibodies directed to the encoded proteins, and 
predictive, preventive, diagnostic, prognostic and therapeutic uses for individuals which are at risk 
for or which have malignant neoplasia and breast cancer in particular. The sequences disclosure 
10 herein have been found to be differentially expressed in samples from breast cancer. 

The present invention is based on the identification of 43 genes that are differentially regulated 
(up- or downregulated) in tumor biopsies of patients with clinical evidence of breast cancer. The 
identification of 43 human genes which were not known to be differentially regulated in breast 
cancer states and their significance for the disease is described in the working examples herein. 
15 The characterization of the co-expression of these genes provides newly identified roles in breast 
cancer. The gene names, the database accession numbers (GenBank and UniGene) as well as the 
putative or known functions of the encoded proteins and their subcellular localization are given in 
Tables 1 to 4. The primer sequences used for the gene amplification are shown in Table 5. 

In either situation, detecting expression of these genes in excess or in with lower level as compared 
20 to normal expression provides the basis for the diagnosis of malignant neoplasia and breast cancer. 
Furthermore, in testing the efficacy of compounds during clinical trials, a decrease in the level of 
the expression of these genes corresponds to a return from a disease condition to a normal state, 
and thereby indicates a positive effect of the compound. 

Another aspect of the present invention is based on the observation that neighboring genes within 
25 defined genomic regions functionally interact and influence each others function directly or 
indirectly. A genomic region encoding functionally interacting genes that are co-amplified and co- 
expressed in neoplastic lesions has been defined as an "ARCHEON". (ARCHEON = Altered 
Region of Changed Chromosomal Expression Observed in Neoplasms). Chromosomal alterations 
often affect more than one gene. This is true for amplifications, duplications, insertions, 
30 integrations, inversions, translocations, and deletions. These changes can have influence on the 
expression level of single or multiple genes. Most commonly in the field of cancer diagnostics and 
treatment the changes of expression levels have been investigated for single, putative relevant 
target genes such as MLVT2 (5pl4), NRASL3 (6pl2), EGFR (7pl2), c-myc (8q23), Cyclin Dl 
(llq!3), IGF1R (15q25), HER-2/neu (17q21), PCNA (20ql2). However, the altered expression 
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level and interaction of multiple (i.e. more than two) genes within one genomic region with each 
other has not been addressed. Genes of an ARCHEON form gene clusters with tissue specific 
expression patterns. The mode of interaction of individual genes within such a gene cluster 
suspected to represent an ARCHEON can be either protein-protein or protein-nucleic acid 
interaction, which may be illustrated but not limited by the following examples: ARCHEON gene 
interaction may be in the same signal transduction pathway, may be receptor to ligand binding, 
receptor kinase and SH2 or SH3 binding, transcription factor to promoter binding, nuclear 
hormone receptor to transcription factor binding, phosphogroup donation (e.g. kinases) and 
acceptance (e.g. phosphoprotein), mRNA stabilizing protein binding and transcriptional processes. 
The individual activity and specificity of a pair genes and or the proteins encoded thereby or of a 
group of such in a higher order, may be readily deduced from literature, published or deposited 
within public databases by the skilled person. However in the context of an ARCHEON the 
interaction of members being part of an ARCHEON will potentiate, exaggerate or reduce their 
singular functions. This interaction is of importance in defined normal tissues in which they are 
normally co-expressed. Therefore, these clusters have been commonly conserved during evolution. 
The aberrant expression of members of these ARCHEON in neoplastic lesions, however, 
(especially within tissues in which they are normally not expressed) has influence on tumor 
characteristics such as growth, invasiveness and drug responsiveness. Due to the interaction of 
these neighboring genes it is of importance to determine the members of the ARCHEON which are 
involved in the deregulation events. In this regard amplification and deletion events in neoplastic 
lesions are of special interest. 

The invention relates to a method for the detection of chromosomal alterations by (a) determining 
the relative mRNA abundance of individual mRNA species or (b) determining the copy number of 
one or more chromosomal region(s) by quantitative PCR. In one embodiment information on the 
genomic organization and spatial regulation of chromosomal regions is assessed by bioinformatic 
analysis of the sequence information of the human genome (UCSC, NCBI) and then combined 
with RNA expression data from GeneChip™ DNA-Airays (Affymetrix) and/or quantitative PCR 
(TaqMan) from RNA-samples or genomic DNA. 

In a further embodiment the functional relationship of genes located on a chromosomal region 
which is altered (amplified or deleted) is established. The altered chromosomal region is defined as 
an ARCHEON if genes located on that region functionally interact. 

The 17q21 locus was investigated as one model system, harboring the HER-2/neu gene. By 
establishing a high-resolution assay to detect amplification events in neighboring genes, 43 genes 
that are commonly co-amplified in breast cancer cell lines and patient samples were identified. By 



WO 2005/047534 



-15- 



PCT/EP2004/011599 



gene array technologies and immunological methods their co-overexpression in tumor samples was 
demonstrated. Surprisingly, by clustering tissue samples with HER-2/neu positive Tumor samples, 
it was found that the expression pattern of this larger genomic region (consisting of 43 genes) is 
very similar to control brain tissue. HER-2/neu negative breast tumor tissue did not show a similar 
5 expression pattern. Indeed, some of the genes within these cluster are important for neural 
development (HER-2/neu, THRA) in mouse model systems or are described to be expressed in 
neural cells (NeuroD2). Moreover, by searching similar gene combinations in the human and 
rodent genome additional homologous chromosomal regions on chromosome 3p21 and 12ql3 
harboring several isoforms of the respective genes (see below) were found. There was a strong 

10 evidence for multiple interactions between the 43 candidate genes, as being part of identical 
pathways (HER-2, neu, GRB7, CrkRS, CDC6), influencing the expression of each other (HER- 
2/neu, THRA, RARA), interacting with each other (PPARGBP, THRA, RARA, NR1D1 or HER- 
2/neu, GRB7) or expressed in defined tissues (CACNB1, PPARGBP, etc.). Interestingly, the 
genomic regions of the ARCHEONs that were identified are amplified in acquired Tamoxifen 

15 resistance of HER-2/neu negative cells (MCF7), which are normally sensitive to Tamoxifen 
treatment [Achuthan et al., 2001,(2)]. 

Moreover, altered responsiveness to treatment due to the alterations of the genes within these 
ARCHEONs was observed. Surprisingly, genes within the ARCHEONs are of importance even in 
the absence of HER-2/neu homologues. Some of the genes within the ARCHEONs, do not only 

20 serve as marker genes for prognostic purposes, but have already been known as targets for 
therapeutic intervention. For example TOP2 alpha is a target of anthracyclins. THRA and RARA 
can be targeted by hormones and hormone analogs (e.g. T3, rT3, RA). Due to their high affinity 
binding sites and available screening assays (reporter assays based on their transcriptional 
potential) the hormone receptors which are shown to be linked to neoplastic pathophysiology for 

25 the first time herein are ideal targets for drug screening and treatment of malignant neoplasia and 
breast cancer in particular. In this regard it is essential to know which members of the ARCHEON 
are altered in the neoplastic lesions. Particularly it is important to know the nature, number and 
extent to which the ARCHEON genes are amplified or deleted. The ARCHEONs are flanked by 
similar, endogenous retroviruses (e.g. HERV-K= ,5 human endogenous retrovirus"), some of which 

30 are activated in breast cancer. These viruses may have also been involved in the evolutionary 
duplication of the ARCHEONs. 

The analysis of the 17q21 region proved data obtained by IHC and identified several additional 
genes being co-amplified with the HER-2/neu gene. Comparative Analysis of RNA-based 
quantitative RT-PCR (TaqMan) with DNA-based qPCR from tumor cell lines identified the same 
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amplified region. Genes at the 1 7ql 1 2 -2 1 . region are offered by way of illustration not by way of 
limitation. A graphical display of the described chromosomal region is provided in Figure 1. 

Biological relevance of the genes which are part of the 17q21 ARCHEON 

MLN50 

5 By differential screening of cDNAs from breast cancer-derived metastatic axillary lymph nodes, 
TRAF4 and 3 other novel genes (MLN51, MLN62, MLN64) were identified that are 
overexpressed in breast cancer [Tomasetto et al., 1995, (3)]. One gene, which they designated 
MLN50, was mapped to 17qll-q21.3 by radioactive in situ hybridization. In breast cancer cell 
lines, overexpression of the 4 kb MLN50 mRNA was correlated with amplification of the gene and 

10 with amplification and overexpression of ERBB2, which maps to the same region. The authors 
suggested that the 2 genes belong to the same amplicon. Amplification of chromosomal region 
17qll-q21 is one of the most common events occurring in human breast cancers. They reported 
that the predicted 261 -amino acid MLN50 protein contains an N-terminal UM domain and a C- 
terminal SH3 domain. They renamed the protein LASP1, for UM and SH3 protein. 1 Northern blot 

15 analysis revealed that LASP1 mRNA was expressed at a basal level in all normal tissues examined 
and overexpressed in 8% of primary breast cancers. In most of these cancers, LASP1 and ERBB2 
were simultaneously overexpressed. 

MLLT6 

The MLLT6 (AF17) gene encodes a protein of 1,093 amino acids, containing a leucine-zipper 
20 dimerization motif located 3 -prime of the fusion point and a cysteine-rich domain at the end 
terminus. AF17 was found to contain stretches of amino acids previously associated with domains 
involved in transcriptional repression or activation. 

Chromosome translocations involving band llq23 are associated with approximately 10% of 
patients with acute lymphoblastic leukemia (ALL) and more than 5% of patients with acute 
25 myeloid leukemia (AML). The gene at 1 lq23 involved in the translocations is variously designated 
ALL1, HRX, MLL, and TRX1. The partner gene in one of the rarer translocations, 
t(ll;17)(q23;q21), designated MLLT6 on 17ql2. 

ZNF144(Mell8) 

Mel 18 cDNA encodes a novel cys-rich zinc finger motif. The gene is expressed strongly in most 
30 tumor cell lines, but its normal tissue expression was limited to cells of neural origin and was 
especially abundant in fetal neural cells. It belongs to a RING-finger motif family which includes 
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BMI1. The MEL18/BM1 gene family represents a mammalian homolog of the Drosophila 
•polycomb 1 gene group, thereby belonging to a memory mechanism involved in maintaining the the 
expression pattern of key regulatory factors such as Hox genes. Bmil, Mel 18 and M33 genes, as 
representative examples of mouse Pc-G genes. Common phenotypes observed in knockout mice 
5 mutant for each of these genes indicate an important role for Pc-G genes not only in regulation of 
Hox gene expression and axial skeleton development but also in control of proliferation and 
survival of haematopoietic cell lineages. This is in line with the observed proliferative deregulation 
observed in lymphoblastic leukemia. The MEL 18 gene is conserved among vertebrates. Its mRNA 
is expressed at high levels in placenta, lung, and kidney, and at lower levels in liver, pancreas, and 

10 skeletal muscle. Interestingly, cervical and lumbo-sacral-HOX gene expression is altered in several 
primary breast cancers with respect to normal breast tissue with the HoxB gene cluster being 
present on 17q distal to the 17q21 locus. Moreover, delay of differentiation with persistent nests of 
proliferating cells was found in endothelial cells cocultured with HOXB7-transduced SkBr3 cells, 
which exhibit a 17q21 amplification. Tumorigenicity of these cells has been evaluated in vivo. 

15 Xenograft in athymic nude mice showed that SkBr3/HOXB7 cells developed tumors with an 
increased number of blood vessels, either irradiated or not, whereas parental SkBr3 cells did not 
show any tumor take unless mice were sublethally irradiated. As part of this invention, we have 
found MEL 18 to be overexpressed specifically in tumors bearing Her-2/neu gene amplification, 
which can be critical for Hox expression. 

20 PHOSPHATIDYUNOSITOL-4-PHOSPHATE 5-KINASE. TYPE II. BETA: PIPS K2B 

Phosphoinositide kinases play central roles in signal transduction.. Phosphatidylinositol-4- 
phosphate 5-kinases (PffSKs) phosphorylate phosphatidylinositol 4-phosphate, giving rise to 
phosphatidylinositol 4,5-bisphosphate. The PP5K enzymes exist as multiple isoforms that have 
various immunoreactivities, kinetic properties, and molecular masses. They are unique in that they 

25 possess almost no homology to the kinase motifs present in other phosphatidylinositol, protein, and 
lipid kinases. By screening a human fetal brain cDNA library with the PIP5K2B EST the full 
length gene could be isolated. The deduced 416-amino acid protein is 78% identical to PIP5K2A. 
Using SDS-PAGE, the authors estimated that bacterially expressed PIP5K2B has a molecular mass 
of 47 kD. Northern blot analysis detected a 6.3-kb PIP5K2B transcript which was abundantly 

30 expressed in several human tissues. P1P5K2B interacts specifically with the juxtamembrane region 
of the p55 TNF receptor (TNFR1) and PIP5K2B activity is increased in mammalian cells by 
treatment with TNF-alpha. A modeled complex with membrane-bound substrate and ATP shows 
how a phosphoinositide kinase can phosphorylate its substrate in situ at the membrane 
interface.The substrate-binding site is open on 1 side, consistent with dual specificity for 

35 phosphatidylinositol 3- and 5-phosphates. Although the amino acid sequence of PDP5K2A does not 
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show homology to known kinases, recombinant PIP5K2A exhibited kinase activity. PIP5K2A 
contains a putative Src homology 3 (SH3) domain-binding sequence. Overexpression of mouse 
PIP5K1B in COS7 cells induced an increase in short actin fibers and a decrease in actin stress 
fibers. 

5 TEM7 

Using serial analysis of gene expression (SAGE) a partial cDNAs corresponding to several tumor 
endothelial markers (TEMs) that displayed elevated expression during tumor angiogenesis could 
be identified. Among the genes identified was TEM7. Using database searches and 5-prime RACE 
the entire TEM7 coding region, which encodes a 500-amino acid type I transmembrane protein,has 

10 been described.. The extracellular region of TEM7 contains a plexin-like domain and has weak 
homology to the ECM protein nidogen. The function of these domains, which are usually found in 
secreted and extracellular matrix molecules, is unknown. Nidogen itself belongs to the entactin 
protein family and helps to determine pathways of migrating axons by switching from 
circumferential to longitudinal migration. Entactin is involved in cell migration, as it promotes 

15 trophoblast outgrowth, through a mechanism mediated by the RGD recognition site, and plays an 
important role during invasion of the endometrial basement membrane at implantation. As entactin 
promotes thymocyte adhesion but affects thymocyte migration only marginally, it is suggested that 
entactin may plays a role in thymocyte localization during T cell development. 

In situ hybridization analysis of human colorectal cancer demonstrated that TEM7 was expressed 
20 clearly in the endothelial cells of the tumor stroma but not in the endothelial cells of normal 
colonic tissue. Using in situ hybridization to assay expression in various normal adult mouse 
tissues, they observed that TEM7 was largely undetectable in mouse tissues or tumors, but was 
abundantly expressed in mouse brain. 

ZNFN1A3 

25 By screening a B-cell cDNA library with a mouse Aiolos N-terminal cDNA probe, a cDNA 
encoding human Aiolos, or ZNFN1 A3, was obtained. The deduced 509-amino acid protein, which 
is 86% identical to its mouse counterpart, has 4 DNA-binding zinc fingers in its N terminus and 2 
zinc fingers that mediate protein dimerization in its C terminus. These domains are 100% and 96% 
homologous to the corresponding domains in the mouse protein, respectively. Northern blot 

30 analysis revealed strong expression of a major 11.0- and a minor 4.4-kb 2MFN1A3 transcript in 
peripheral blood leukocytes, spleen, and thymus, with lower expression in liver, small intestine, 
and lung. 
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Hcaros (ZNFN1A1), a hemopoietic zinc finger DNA-binding protein, is a central regulator of 
lymphoid differentiation and is implicated in leukemogenesis. The execution of normal function of 
Ikaros requires sequence-specific DNA binding, transactivation, and dimerization domains. Mice 
with a mutation in a related zinc finger protein, Aiolos, are prone to B-cell lymphoma, hi 

5 chemically induced murine lymphomas allelic losses on markers surrounding the Znfhlal gene 
were detected in 27% of the tumors analyzed. Moreover specific Ikaros expression was in primary 
mouse hormone-producing anterior pituitary cells and substantial for Fibroblast growth factor 
receptor 4 (FGFR4) expression, which itself is implicated in a multitude of endocrine cell 
hormonal and proliferative properties with FGFR4 being differentially expressed in normal and 

10 neoplastic pituitary. Moreover Ikaros binds to chromatin remodelling complexes containing 
SWI/SNF proteins, which antagonize Polycomb function. Ihtetrestingly at the telomeric end of the 
disclosed ARCHEON the SWI/SNF complex member SMARCE1 (= SWI/SNF-related, matrix- 
associated, actin-dependent regulators of chromatin) is located and part of the described 
amplification. Due to the related binding specificities of Ikaros and Palindrom Binding Protein 

1 5 (PBP) it is suggestive, that ZNFN1 A3 is able to regulate the Her-2/neu enhancer. 

PPP1R1B 

Midbrain dopaminergic neurons play a critical role in multiple brain functions, and abnormal 
signaling through dopaminergic pathways has been implicated in several major neurologic and 
psychiatric disorders. One well-studied target for the actions of dopamine is DARPP32. In the 
20 densely dopamine- and glutamate-innervated rat caudate-putamen, DARPP32 is expressed in 
medium-sized spiny neurons that also express dopamine Dl receptors. The function of DARPP32 
seems to be regulated by receptor stimulation. Both dopaminergic and glutamatergic (NMDA) 
receptor stimulation regulate the extent of DARPP32 phosphorylation, but in opposite directions. 

The human DARPP32 was isolated from a striatal cDNA library. The 204-amino acid DARPP32 
25 protein shares 88% and 85% sequence identity, respectively, with bovine and rat DARPP32 
proteins. The DARPP32 sequence is particularly conserved through the N terminus, which 
represents the active portion of the protein. Northern blot analysis demonstrated that the 2.1-kb 
DAKPP32 mRNA is more highly expressed in human caudate than in cortex. In situ hybridization 
to postmortem human brain showed a low level of DARPP32 expression in all neocortical layers, 
30 with the strongest hybridization in the superficial layers. CDK5 phosphorylated DARPP32 in vitro 
and in intact brain cells. Phospho-thr75 DARPP32 inhibits PKA in vitro by a competitive 
mechanism. Decreasing phospho-thr75 DARPP32 in striatal cells either by a CDK5-specific 
inhibitor or by using genetically altered mice resulted in increased dopamine-induced 
phosphorylation of PKA substrates and augmented peak voltage-gated calcium currents. Thus, 
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DARPP32 is a Afunctional signal transduction molecule which, by distinct mechanisms, controls a 
serine/threonine kinase and a serine/threonine phosphatase. 

DARPP32 and t-DARPP are overexpressed in gastric cancers. It's suggested that overexpression of 
these 2 proteins in gastric cancers may provide an important survival advantage to neoplastic cells. 
5 It could be demonstrated that Darpp32 is an obligate intermediate in progesterone-facilitated 
sexual receptivity in female rats and mice. The facilitative effect of progesterone on sexual 
receptivity in female rats was blocked by antisense oligonucleotides to Darpp32. Homozygous 
mice carrying a null mutation for the Darpp32 gene exhibited minimal levels of progesterone- 
facilitated sexual receptivity when compared to their wildtype littermates, and progesterone 
1 0 significantly increased hypothalamic cAMP levels and cAMP-dependent protein kinase activity. 

CACNB1 

In 1991 a cDNA clone encoding a protein with high homology to the beta subunit of the rabbit 
skeletal muscle dihydropyridine-sensitive calcium channel from a rat brain cDNA library [Pragnell 
et al., 1991, (4)]. This rat brain beta-subunit cDNA hybridized to a 3.4-kb message that was 

15 expressed in high levels in the cerebral hemispheres and hippocampus and much lower levels in 
cerebellum. The open reading frame encodes 597 amino acids with a predicted mass of 65,679 Da 
which is 82% homologous with the skeletal muscle beta subunit The corresponding human beta- 
subunit gene was localized to chromosome 17 by analysis of somatic cell hybrids. The authors 
suggested that the encoded brain beta subunit, which has a primary structure highly similar to its 

20 isoform in skeletal muscle, may have a comparable role as an integral regulatory component of a 
neuronal calcium channel. 

RPL19 

The ribosome is the only organelle conserved between prokaryotes and eukaryotes. In eukaryotes, 
this organelle consists of a 60S large subunit and a 40S small subunit. The mammalian ribosome 

25 contains 4 species of RNA and approximately 80 different ribosomal proteins, most of which 
appear to be present in equimolar amounts. In mammalian cells, ribosomal proteins can account for 
up to 15% of the total cellular protein, and the expression of the different ribosomal protein genes, 
which can account for up to 7 to 9% of the total cellular mRNAs, is coordinately regulated to meet 
the cell's varying requirements for protein synthesis. The mammalian ribosomal protein genes are 

30 members of multigene families, most of which are composed of multiple processed pseudogenes 
and a single functional intron-containing gene. The presence of multiple pseudogenes hampered 
the isolation and study of the functional ribosomal protein genes. By study of somatic cell hybrids, 
it has been elucidated that DNA sequences complementary to 6 mammalian ribosomal protein 
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cDNAs could be assigned to chromosomes 5, 8, and 17. Ten fragments mapped to 3 chromosomes 
[Nakamichi et al., 1986, (5)]. These are probably a mixture of functional (expressed) genes and 
pseudogenes. One that maps to 5q23-q33 rescues Chinese hamster emetine-resistance mutations in 
interspecies hybrids and is therefore the transcriptionally active RPS14 gene. In 1989 a PCR- 
5 based strategy for the detection of intron-containing genes in the presence of multiple pseudogenes 
was described. This technique was used to identify the intron-containing PCR products of 7 human 
ribosomal protein genes and to map their chromosomal locations by hybridization to human/rodent 
somatic cell hybrids [Feo et al., 1992, (6)]. All 7 ribosomal protein genes were found to be on 
different chromosomes: RPL19 on 17pl2-qll;RPL30 on 8; RPL35A on 18; RPL36A on 14; RPS6 
10 on 9pter-pl3; RPS1 1 on 19cen-qter, and RPS17 on 1 lpter-pl3. These are also different sites from 
the chromosomal location of previously mapped ribosomal protein genes S14 on chromosome 5, 
S4 on Xq and Yp, and RP1 17A on 9q3-q34. By fluorescence in situ hybridization the position of 
the RPL1 9 gene was mapped to 17ql 1 [Davies et al., 1989, (7)]. 

PPAJRBP. PBP. CRSPL CRSP200. TRIP2. TRAP220. RBI 8 A. DRIP230 

15 The thyroid hormone receptors (TRs) are hormone-dependent transcription factors that regulate 
expression of a variety of specific target genes. They must specifically interact with a number of 
proteins as they progress from their initial translation and nuclear translocation to 
heterodimerization with retinoid X receptors (RXRs), functional interactions with other 
transcription factors and the basic transcriptional apparatus, and eventually, degradation. To help 

20 elucidate the mechanisms that underlie the transcriptional effects and other potential functions of 
TRs, the yeast interaction trap, a version of the yeast 2-hybrid system, was used to identify 
proteins that specifically interact with the ligand-binding domain of rat TR-beta-1 (THRB) [Lee et 
al., 1995, (8)]. The authors isolated HeLa cell cDNAs encoding several different TR-interacting 
proteins (TRIPs), including TRIP2. TRIP2 interacted with rat Thrb only in the presence of thyroid 

25 hormone. It showed a ligand-independent interaction with RXR-alpha, but did not interact with the 
glucocorticoid receptor (NR3C1) under any condition. By immunoscreening a human B-lymphoma 
cell cDNA expression library with the anti-p53 monoclonal antibody PAM801, PPARBP was 
identified, which was called RB18A for Recognized by PAM801 monoclonal antibody' [Drane et 
al., 1997, (9)]. The predicted 1,566-amino acid RB18A protein contains several potential nuclear 

30 localization signals, 13 potential N-glycosylation sites, and a high number of potential 
phosphorylation sites. Despite sharing common antigenic determinants with p53, RB18A does not 
show significant nucleotide or amino acid sequence similarity with p53. Whereas fee calculated 
molecular mass of RB18A is 166 kD, the apparent mass of recombinant RB18A was 205 kD by 
SDS-PAGE analysis. The authors demonstrated that RB18A shares functional properties with p53, 

35 including DNA binding, p53 binding, and self-oligomerization. Furthermore, RB18A was able to 
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activate the sequence-specific binding of p53 to DNA, which was induced through an unstable 
interaction between both proteins. Northern blot analysis of human tissues detected an 8.5-kb 
RB18A transcript in all tissues examined except kidney, with highest expression in heart. 
Moreover mouse Pparbp, which was called Pbp for ^par-binding protein, 1 as a protein that 
5 interacts with the Ppar-gamma (PPARG) ligand-binding domain in a yeast 2-hybrid system was 
identified [Zhu et a]., 1997, (10)]. The authors found that Pbp also binds to PPAR-alpha (PPARA), 
RAR-alpha (RARA), RXR, and TR-beta-1 in vitro. The binding of Pbp to these receptors 
increased in the presence of specific ligands. Deletion of the last 12 amino acids from the C 
terminus of PPAR-gamma resulted in the abolition of interaction between Pbp and PPAR-gamma. 

10 Pbp modestly increased the transcriptional activity of PPAR-gamma, and a truncated form of Pbp 
acted as a dominant-negative repressor, suggesting that Pbp is a genuine transcriptional co- 
activator for PPAR. The predicted 1,560-amino acid Pbp protein contains 2 LXXLL motifs, which 
are considered necessary and sufficient for the binding of several co-activators to nuclear 
receptors. Northern blot analysis detected Pbp expression in all mouse tissues examined, with 

15 higher levels in liver, kidney, lung, and testis. In situ hybridization showed that Pbp is expressed 
during mouse ontogeny, suggesting a possible role for Pbp in cellular proliferation and differen- 
tiation. In adult mouse, in situ hybridization detected Pbp expression in liver, bronchial epithelium 
in the lung, intestinal mucosa, kidney cortex, thymic cortex, splenic follicles, and seminiferous 
epithelium in testis. Lateron PPARBP was identified, which was called TRAP220, from an 

20 immunopurified TO-alpha (THRA)-TRAP complex [Yuan et al., 1998, (11)]. The authors cloned 
Jurkat cell cDNAs encoding TRAP220. The predicted 1,581-amino acid TRAP220 protein 
contains LXXLL domains, which are found in other nuclear receptor-interacting proteins. 
TRAP220 is nearly identical to RB18A , with these proteins differing primarily by an extended N 
terminus on TRAP220. In the absence of TR-alpha, TRAP220 appears to reside in a single 

25 complex with other TRAPs. TRAP220 showed a direct ligand-dependent interaction with TR- 
alpha, which was mediated through the C terminus of TR-alpha and, at least in part, the LXXLL 
domains of TRAP220. TRAP220 also interacted with other nuclear receptors, including vitamin D 
receptor, RARA, RXRA, PPARA, PPARG, and estrogen receptor-alpha (ESR1; 133430), in a 
ligand-dependent manner. TRAP220 moderately stimulated human TR-alpha-mediated 

30 transcription in transfected cells, whereas a fragment containing the LXXLL motifs acted as a 
dominant-negative inhibitor of nuclear receptor-mediated transcription both in transfected cells 
and in cell-free transcription systems. Further studies indicated that TRAP220 plays a major role in 
anchoring other TRAPs to TR-alpha during the function of the TR-alpha-TRAP complex and that 
TRAP220 may be a global co-activator for the nuclear receptor superfamily. PBP, a nuclear 

35 receptor co-activator, interacts with estrogen receptor-alpha (ESR1) in the absence of estrogen. 
This interaction was enhanced in the presence of estrogen, but was reduced in the presence of the 
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anti-estrogen Tamoxifen. Transfection of PBP into cultured cells resulted in enhancement of 
estrogen-dependent transcription, indicating that PBP serves as a co-activator in estrogen receptor 
signaling. To examine whether overexpression of PBP plays a role in breast cancer because of its 
co-activator function in estrogen receptor signaling, the levels of PBP expression in breast tumors 
5 was determined [Zhu et al., 1999, (12)]. High levels of PBP expression were detected in 
approximately 50% of primary breast cancers and breast cancer cell lines by ribonuclease 
protection analysis, in situ hybridization, and immunoperoxidase staining. By using FISH, the 
authors mapped the PBP gene to 17q,12, a region that is amplified in some breast cancers. They 
found PBP gene amplification in approximately 24% (6 of 25) of breast tumors and approximately 
10 30% (2 of 6) of breast cancer cell lines, implying that PBP gene overexpression can occur 
independent of gene amplification. They determined that the PBP gene comprises 17 exons that 
together span more than 37 kb. Their findings, in particular PBP gene amplification, suggested that 
PBP, by its ability to function as an estrogen receptor-alpha co-activator, may play a role in 
mammary epithelial differentiation and in breast carcinogenesis. 

15 NEUROD2 

Basic helix-loop-helix (bHLH) proteins are transcription factors involved in determining cell type 
during development. In 1995 a bHLH protein was described, termed NeuroD (for Neurogenic 
differentiation'), that functions during neurogenesis [Lee et al., 1995, (13)]. The human NEUROD 
gene maps to chromosome 2q32. The cloning and characterization of 2 additional NEUROD 

20 genes, NEUROD2 and NEUROD3 was described in 1996 [McCormick et al., 1996, (14)]. 
Sequences for the mouse and human homologues were presented. NEUROD2 shows a high degree 
of homology to the bHLH region of NEUROD, whereas NEUROD3 is more distantly related. The 
authors found that mouse neuroD2 was initially expressed at embryonic day 11, with persistent 
expression in the adult nervous system. Similar to neuroD, neuroD2 appears to mediate neuronal 

25 differentiation. The human NEUROD2 was mapped to 17ql2 by fluorescence in situ hybridization 
and the mouse homologue to chromosome 1 1 [Tamimi et al., 1997, (15)]. 

TELETHONIN 

Telethonin is a sarcomeric protein of 19 kD found exclusively in striated and cardiac muscle It 
appears to be localized to the Z disc of adult skeletal muscle and cultured myocytes. Telethonin is 
30 a substrate of titin, which acts as a molecular 'ruler 1 for the assembly of the sarcomere by providing 
spatially defined binding sites for other sarcomeric proteins. After activation by phosphorylation 
and calcium/calmodulin binding, titin phosphorylates the C-terminal domain of telethonin in early 
differentiating myocytes. The telethonin gene has been mapped to 17ql2, adjacent to the 
phenylethanolamine N-methyltransferase gene [Valle et al., 1997, (16)]. 
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PENT. PNMT 

Phenylethanolamine N-methyltransferase catalyzes the synthesis of epinephrine from 
norepinephrine, the last step of catecholamine biosynthesis. The cDNA clone was first isolated in 
1998 for bovine adrenal medulla PNMT using mixed oligodeoxyribonucleotide probes whose 
5 synthesis was based on the partial amino acid sequence of tryptic peptides from the bovine enzyme 
[Kaneda et al., 1988, (17)]. Using a bovine cDNA as a probe, the authors screened a human 
pheochromocytoma cDNA library and isolated a cDNA clone with an insert of about 1.0 kb, which 
contained a complete coding region of the enzyme. Northern blot analysis of human 
pheochromocytoma polyadenylated RNA using this cDNA insert as the probe demonstrated a 

10 single RNA species of about 1,000 nucleotides, suggesting that this clone is a full-length cDNA. 
The nucleotide sequence showed that human PNMT has 282 amino acid residues with a predicted 
molecular weight of 30,853, including the initial methionine. The amino acid sequence was 88% 
homologous to that of bovine enzyme. The PNMT gene was found to consist of 3 exons and 2 
introns spanning about 2,100 basepairs. It was demonstrated that in transgenic mice the gene is 

15 expressed in adrenal medulla and retina. A hybrid gene consisting of 2 kb of the PNMT 5-prime- 
flanking region fused to the simian virus 40 early region also resulted in tumor antigen mKNA 
expression in adrenal glands and eyes; furthermore, immunocytochemistry showed that the tumor 
antigen was localized in nuclei of adrenal medullary cells and cells of the inner nuclear cell layer 
of the retina, both prominent sites of epinephrine synthesis. The results indicate that the 

20 enhancers) for appropriate expression of the gene in these cell types are in the 2-kb 5-prime- 
flanking region of the gene. 

Kaneda et ah, 1988 (17), assigned the human PNMT gene to chromosome 17 by Southern blot 
analysis of DNA from mouse-human somatic cell hybrids. In 1992 the localization was narrowed 
down to 17q21-q22 by linkage analysis using RFLPs related to the PNMT gene and several 17q 
25 DNA markers [Hoehe et al, 1992, (18)]. The findings are of interest in light of the description of 
a genetic locus associated with blood pressure regulation in the stroke-prone spontaneously 
hypertensive rat (SHR-SP) on rat chromosome 10 in a conserved linkage synteny group corre- 
sponding to human chromosome 17q22-q24. See essential hypertension . 

MGC9753 

30 This gene maps on chromosome 17, at 17ql2 according to RefSeq. It is expressed at very high 
level. It is defined by cDNA clones and produces, by alternative splicing, 7 different transcripts 
can be obtained (SEQ ID NO:60 to 66 and 83 to 89 /Table 1), altogether encoding 7 different 
protein isoforms. Of specific interest is the putatively secreted isoform g, encoded by a mRNA of 
2.55 kb. It's premessenger covers 16.94 kb on the genome. It has a very long 3 1 UTR. . The protein 
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(226 aa, MW 24.6 kDa, pi 8.5) contains no Pfam motif. The MGC9753 gene produces, by 
alternative splicing, 7 types of transcripts, predicted to encode 7 distinct proteins. It contains 13 
confirmed introns, 10 of which are alternative. Comparison to the genome sequence shows that 11 
introns follow the consensual [gt-ag] rule, 1 is atypical with good support [tg^cg]. The six most 
5 abundant isofonns are designated by a) to i) and code for proteins as follows: 

a) This mRNA is 3.03 kb long, its premessenger covers 16.95 kb on the genome. It has a very 
long y UTR. The protein (190 aa, MW 21.5 kDa, pi 7.2) contains no Pfam motif. It is 
predicted to localise in the endoplasmic reticulum. 

c) This mRNA is 1.17 kb long, its premessenger covers 16.93 kb on the genome. It may be 
10 incomplete at the N terminus. The protein (368 aa, MW 41 .5 kDa, pi 7.3) contains no Pfam 

motif. 

d) This mRNA is 3.17 kb long, its premessenger covers 16.94 kb on the genome. It has a very 
long 3 ! UTR and S'p UTR. . The protein (190 aa, MW 21.5 kDa, pi 7.2) contains no Pfam 
motif. It is predicted to localise in the endoplasmic reticulum. 

15 g) This mRNA is 2.55 kb long, its premessenger covers 1 6.94 kb on the genome. It has a very 
long 3' UTR. . The protein (226 aa, MW 24.6 kDa, pi 8.5) contains no Pfam motif. It is 
predicted to be secreted. 

h) This mRNA is 2.68 kb long, its premessenger covers 16.94 kb on the genome. It has a very 
long 3' UTR. . The protein (320 aa, MW 36.5 kDa, pi 6.8) contains no Pfam motif. It is 

20 predicted to localise in the endoplasmic reticulum. 

i) This mRNA is 2.34 kb long, its premessenger covers 16.94 kb on the genome. It may be 
incomplete at the N terminus. It has a very long 3' UTR. . The protein (217 aa, MW 24.4 
kDa, pi 5.9) contains no Pfam motif. 

The MCG9753 gene may be homologue to the CAB2 gene located on chromosome 17ql2. The 
25 CAB2, a human homologue of the yeast COS16 required for the repair of DNA double-strand 
breaks was cloned. Autofluorescence analysis of cells transfected with its GFP fusion protein 
demonstrated that CAB2 translocates into vesicles, suggesting that overexpression of CAB2 may 
decrease intercellular Mn- 

(2 +) by accumulating it in the vesicles, in the same way as yeast. 



i 
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Her-2/neu. ERBB2. NGL. TKR1 

The oncogene originally called NEU was derived from rat neuro/glioblastoma cell lines. It encodes 
a tumor antigen, pi 85, which is serologically related to EGFR, the epidermal growth factor 
receptor. EGFR maps to chromosome 7. Ihl985 it was found, that the human homologue, which 
5 they designated NGL (to avoid confusion with neuraminidase, which is also symbolized NEU), 
maps to 17ql2-q22 by in situ hybridization and to 17q21-qter in somatic cell hybrids [Yang-Feng 
et ah, 1985, (19)]. Thus, the SRO is 17q21-q22. Moreover, inl985 a potential cell surface receptor 
of the tyrosine kinase gene family was identified and characterized by cloning the gene [Coussens 
et al., 1985, (20)]. Its primary sequence is very similar to that of the human epidermal growth 

10 factor receptor. Because of the seemingly close relationship to the human EGF receptor, the 
authors called the gene HER2. By Southern blot analysis of somatic cell hybrid DNA and by in 
situ hybridization, the gene was assigned to 17q21-q22. This chromosomal location of the gene is 
coincident with the NEU oncogene, which suggests that the 2 genes may in fact be the same; 
indeed, sequencing indicates that they are identical. Inl988 a correlation between overexpression 

15 of NEU protein and the large-cell, comedo growth type of ductal carcinoma was found [van de 
Vijver et al., 1988, (21)]. The authors found no correlation, however, with lymph-node status or 
tumor recurrence. The role of HER2/NEU in breast and ovarian cancer was described in 1989, 
which together account for one-third of all cancers in women and approximately one-quarter of 
cancer-related deaths in females [Slamon et al., 1989, (22)]. 

20 An ERBB-related gene that is distinct from the ERBB gene, called ERBB1 was found in 1985. 
ERBB2 was not amplified in vulva carcinoma cells with EGFR amplification and did not react 
with EGF receptor mRNA. About 30-fold amplification of ERBB2 was observed in a human 
adenocarcinoma of the salivary gland. By chromosome sorting combined with velocity 
sedimentation and Southern hybridization, the ERBB2 gene was assigned to chromosome 17 

25 [Fukushige et al.,1986, (23)]. By hybridization to sorted chromosomes and to metaphase spreads 
with a genomic probe, they mapped the ERBB2 locus to 17q21. This is the chromosome 17 
breakpoint in acute promyelocyte leukemia (APL). Furthermore, they observed amplification and 
elevated expression of the ERBB2 gene in a gastric cancer cell line. Antibodies against a synthetic 
peptide corresponding to 14 amino acid residues at the COOH-terminus of a protein deduced from 

30 the ERBB2 nucleotide sequence were raised in 1986. With these antibodies, the ERBB2 gene 
product from adenocarcinoma cells was precipitated and demonstrated to be a 185-kD glycoprotein 
with tyrosine kinase activity. A cDNA probe for ERBB2 and by in situ hybridization to APL cells 
with a 15; 17 chromosome translocation located the gene to the proximal side of the breakpoint 
[Kaneko et al., 1987, (24)]. The authors suggested that both the gene and the breakpoint are 

35 located in band 17q21.1 and, further, that the ERBB2 gene is involved in the development of 
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leukemia. In 1987 experiments indicated that NEU and HER2 are both the same as ERBB2 [Di 
Fiore et al., 1987, (25)]. The authors demonstrated that overexpression alone can convert the gene 
for a normal growth factor receptor, namely, ERBB2, into an oncogene. The ERBB2 to 17ql l-q21 
by in situ hybridization [Popescu et al., 1989, (26)]. By in situ hybridization to chromosomes 
5 derived from fibroblasts carrying a constitutional translocation between 15 and 17, they showed 
that the ERBB2 gene was relocated to the derivative chromosome 15; the gene can thus be 
localized to 17ql2-q21.32. By family linkage studies using multiple DNA markers in the 17ql2- 
q21 region the ERBB2 gene was placed on the genetic map of the region. 

Interleukin-6 is a cytokine that was initially recognized as a regulator of immune and inflammatory 
10 responses, but also regulates the growth of many tumor cells, including prostate cancer. 
Overexpression of ERBB2 and ERBB3 has been implicated in the neoplastic transformation of 
prostate cancer. Treatment of a prostate cancer cell line with IL6 induced tyrosine phosphorylation 
of ERBB2 and ERBB3, but not ERBB1/EGFR. The ERBB2 forms a complex with the gpl30 
subunit of the JL6 receptor in an EL6-dependent manner. This association was important because 
15 the inhibition of ERBB2 activity resulted in abrogation of IL6-induced MAPK activation. Thus, 
ERBB2 is a critical component of IL6 signaling through the MAP kinase pathway [Qiu et al., 
1998, (27)]. These findings showed how a cytokine receptor can diversify its signaling pathways 
by engaging with a growth factor receptor kinase. 

Overexpression of ERBB2 confers Taxol resistance in breast cancers. Overexpression of ERBB2 
20 inhibits Taxol-induced apoptosis [Yu et al., 1998, (28)]. Taxol activates CDC2 kinase in MDA- 
MB-435 breast cancer cells, leading to cell cycle arrest at the G2/M phase and, subsequently, 
apoptosis. A chemical inhibitor of CDC2 and a dominant-negative mutant of CDC2 blocked Taxol- 
induced apoptosis in these cells. Overexpression of ERBB2 in MDA-MB-435 cells by transfection 
transcriptionally upregulates CDKN1A which associates with CDC2, inhibits Taxol-mediated 
25 CDC2 activation, delays cell entrance to G2/M phase, and thereby inhibits Taxol-induced 
apoptosis. In CDKN1 A antisense-transfected MDA-MB-435 cells or in p2W- MEF cells, ERBB2 
was unable to inhibit Taxol-induced apoptosis. Therefore, CDKN1 A participates in the regulation 
of a G2/M checkpoint that contributes to resistance to Taxol-induced apoptosis in ERBB2- 
overexpressing breast cancer cells. 

30 A secreted protein of approximately 68 kD was described, designated herstatin, as the product of 
an alternative ERBB2 transcript that retains intron 8 [Doherty et al., 1999, (29)]. This alternative 
transcript specifies 340" residues identical to siibdomains l and II from the extracellular domain of 
pl85ERBB2, followed by a unique C-terminal sequence of 79 amino acids encoded by intron 8. 
The recombinant product of the alternative transcript specifically bound to ERBB2-transfected 
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cells and was chemiqally crosslinked to pl85ERBB2, whereas the intron-encoded sequence alone 
also bound with high affinity to transfected cells and associated with pi 85 solubilized from cell 
extracts. The herstatin mRNA was expressed in normal human fetal kidney and liver, but was at 
reduced levels relative to pl85ERBB2 mRNA in carcinoma cells that contained an amplified 
5 ERBB2 gene. Herstatin appears to be an inhibitor of pl85ERBB2, because it disrupts dimers, 
reduces tyrosine phosphorylation of pi 85, and inhibits the anchorage-independent growth of 
transformed cells that overexpress ERBB2. The HER2 gene is amplified and HER2 is 
overexpressed in 25 to 30% of breast cancers, increasing the aggressiveness of the tumor. Finally, 
it was found that a recombinant monoclonal antibody against HER2 increased the clinical benefit 
10 of first-line chemotherapy in metastatic breast cancer that overexpresses HER2 [Slamon et al., 
2001, (30)]. 

GRB7 

Growth factor receptor tyrosine kinases (GF-RTKs) are involved in activating the cell cycle. 
Several substrates of GF-RTKs contain Src-homology 2 (SH2) and SID domains. SH2 domain- 

15 containing proteins are a diverse group of molecules important in tyrosine kinase signaling. Using 
the CORT (cloning of receptor targets) method to screen a high expression mouse library, the 
gene foT murine Grb7, which encodes a protein of 535 amino acids, was isolated [Margolis et al., 
1992, (31)]. GRB7 is homologous to ras-GAP (ras-GTPase-activating protein). It contains an SHE 
domain and is highly expressed in liver and kidney. This gene defines the GRB7 family, whose 

20 members include the mouse gene GrblO and the human gene GRB 14. 

A putative GRB7 signal transduction molecule and a GRB7V novel splice variant from an invasive 
human esophageal carcinoma was isolated [Tanaka et al., 1998, (32)]. Although both GRB7 
isoforms shared homology with the Mig-10 cell migration gene of Caenorhabditis elegans, the 
GRB7V isofoim lacked 88 basepairs in the C terminus; the resultant frameshift led to substitution 

25 of an SH2 domain with a short hydrophobic sequence. The wildtype GRB7 protein, but not the 
GRB7V isoform, was rapidly tyrosyl phosphorylated in response to EGF stimulation in esophageal 
carcinoma cells. Analysis of human esophageal tumor tissues and regional lymph nodes with 
metastases revealed that GRB7V was expressed in 40% of GRB7-positive esophageal carcinomas. 
GRB7V expression was enhanced after metastatic spread to lymph nodes as compared to the 

30 original tumor tissues. Transfection of an antisense GRB7 RNA expression construct lowered 
endogenous GRB7 protein levels and suppressed the invasive phenotype exhibited by esophageal 
carcinoma cells. These findings suggested that GRB7 isoforms are involved in cell invasion and 
metastatic progression of human esophageal carcinomas. By sequence analysis, The GRB7 gene 
was mapped to chromosome 17q21-q22, near the topoisomerase-2 gene pong et al., 1997, (33)]. 
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GRB-7 is amplified in concert with HER2 in several breast cancer cell lines and that GRB-7 is 
overexpressed in both cell lines and breast tumors. GRB-7, through its SH2 domain, binds tightly 
to HER2 such that a large fraction of the tyrosine phosphorylated HER2 in SKBR-3 cells is bound 
to GRB-7 [Stein et al., 1994, (34)]. 

5 GCSF. CSF3 ' 

Granulocyte colony-stimulating factor (or colony stimulating factor-3) specifically stimulates the 
proliferation and differentiation of the progenitor cells for granulocytes. The partial amino acid 
sequence of purified GCSF protein was determined, and by using oligonucleotides as probes, 
several GCSF cDNA clones were isolated from a human squamous carcinoma cell line cDNA 

10 library [Nagata et al., 1986, (35)]. Cloning of human GCSF cDNA shows that a single gene codes 
for a 177- or 180-amino acid mature protein of molecular weight 19,600. The authors found that 
the GCSF gene has 4 introns and that 2 different polypeptides are synthesized from the same gene 
by differential splicing of mRNA. The 2 polypeptides differ by the presence or absence of 3 amino 
acids. Expression studies indicate that both have authentic GCSF activity. A stimulatory activity 

15 from a glioblastoma multiform cell line being biologically and biochemically indistinguishable 
from GCSF produced by a bladder cell line was found in 1987. By somatic cell hybridization and 
in situ chromosomal hybridization, the GCSF gene was mapped to 17qll in the region of the 
breakpoint in the 15; 17 translocation characteristic of acute promyelocyte leukemia [Le Beau et 
al., 1987, (36)]. Further studies indicated that the gene is proximal to the said breakpoint and that it 

20 remains on the rearranged chromosome 17. Southern blot analysis using both conventional and 
pulsed field gel electrophoresis showed no rearranged restriction fragments. By use of a full-length 
cDNA clone as a hybridization probe in human-mouse somatic cell hybrids and in flow-sorted 
human chromosomes, the gene for GCSF was mapped to 17q21-q22 lateron 

THRA, THRA1. ERBA. EAR7. ERBA2. ERBA3 

25 Both human and mouse DNA have been demonstrated to have two distantly related classes of 
ERBA genes and that in the human genome multiple copies of one of the classes exist [Jansson et 
al, 1983, (37)]. A cDNA was isolated derived from rat brain messenger RNA on the basis of 
homology to the human thyroid receptor gene [Thompson et al., 1987, (38)]. Expression of this 
cDNA produced a high-affinity binding protein for thyroid hormones. Messenger RNA from this 

30 gene was expressed in tissue-specific fashion, with highest levels in the central nervous system and 
no expression in the liver. An increasing body of evidence indicated the presence of multiple 
thyroid hormone Teceptors. The authors suggested that there may be as many as 5 different but 
related loci. Many of the clinical and physiologic studies suggested the existence of multiple 
receptors. For example, patients had been identified with familial thyroid hormone resistance in 
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which peripheral response to thyroid hormones is lost or diminished while neuronal functions are 
maintained. Thyroidologists recognize a form of cretinism in which the nervous system is severely 
affected and another form in which the peripheral functions of thyroid hormone are more 
dramatically affected. 

5 The cDNA encoding a specific form of thyroid hormone receptor expressed in human liver, 
kidney, placenta, and brain was isolated [Nakai et al., 1988, (39)]. Identical clones were found in 
human placenta. The cDNA encodes a protein of 490 amino acids and molecular mass of 54,824. 
Designated thyroid hormone receptor type alpha-2 (THRA2), this protein is represented by 
mRNAs of different size in liver and kidney, which may represent tissue-specific processing of the 
1 0 primary transcript. 

The THRA gene contains 10 exons spanning 27 kb of DNA. The last 2 exons of the gene are 
alternatively spliced. A 5-kb THRA1 mRNA encodes a predicted 410-amino acid protein; a 2.7-kb 
THRA2 mRNA encodes a 490-amino acid protein. A third isoform, TR-alpha-3, is derived by 
alternative splicing. The proximal 39 amino acids of the TH-alpha-2 specific sequences are deleted 

15 in TR-alpha-3. A second gene, THRB on chromosome 3, encodes 2 isoforms of TR-beta by 
alternative splicing. Ihl989the structure and function of the EAR1 and EAR7 genes was 
elucidated, both located on 17q21 [Miyajima et al., 1989, (40)]. The authors determined that one 
of the exons in the EAR7 coding sequence overlaps an exon of EAR1, and that the 2 genes are 
transcribed from opposite DNA strands. In addition, the EAR7 mRNA generates 2 alternatively 

20 spliced isoforms, referred to as EAR71 and EAR72, of which the EAR71 protein is the human 
counterpart of the chicken c-erbA protein. 

The thyroid hormone receptors, beta, alpha- 1, and alpha-2 3 mRNAs are expressed in all tissues 
examined and the relative amounts of the three mRNAs were roughly parallel. None of the 3 
mRNAs was abundant in liver, which is the major thyroid hormone-responsive organ. This led to 

25 the assumption that another thyroid hormone receptor may be present in liver. It was found that 
ERBA, which potentiates ERBB, has an amino acid sequence different from that of other known 
oncogene products and related to those of the carbonic anhydrases Pebuire et al., 1984, (41)]. 
ERBA potentiates ERBB by blocking differentiation of erythroblasts at an immature stage. 
Carbonic anhydrases participate in the transport of carbon dioxide in erythrocytes. In 1986 it was 

30 shown that the ERBA protein is a high-affinity receptor for thyroid hormone. The cDNA sequence 
indicates a relationship to steroid-hormone receptors, and binding studies indicate that it is a 
receptor for thyroid hormones. It is located in the nucleus, where it binds to DNA and activates 
transcription. 
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Maternal thyroid hormone is transferred to the fetus early in pregnancy and is postulated to 
regulate brain development. The ontogeny of TR isoforms and related splice variants in 9 first- 
trimester fetal brains by semi-quantitative RT-PCR analysis has been investigated. Expression of 
the TR-beta-1, TR-alpha-1, and TR-alpha-2 isoforms was detected from 8.1 weeks 1 gestation. An 
5 additional truncated species was detected with the TR-alpha-2 primer set, consistent with the TR- 
alpha-3 splice variant described in the rat All TR-alpha-derived transcripts were coordinately 
expressed and increased approximately 8-fold between 8.1 and 13.9 weeks' gestation. A more 
complex ontogenic pattern was observed for TR-beta-1, suggestive of a nadir between 8.4 and 12.0 
weeks' gestation. The authors concluded that these findings point to an important role for the TR- 
IO alpha-1 isoform in mediating maternal thyroid hormone action during first-trimester fetal brain 
development. 

The identification of the several types of thyroid hormone receptor may explain the normal 
variation in thyroid hormone responsiveness of various organs and the selective tissue 
abnormalities found in the thyroid hormone resistance syndromes. Members of sibships, who were 
1 5 resistant to thyroid hormone action, had retarded growth, congenital deafness, and abnormal bones, 
but had normal intellect and sexual maturation, as well as augmented cardiovascular activity. In 
this family abnormal T3 nuclear receptors in blood cells and fibroblasts have been demonstrated. 
The availability of cDNAs encoding the various thyroid hormone receptors was considered useful 
in determining the underlying genetic defect in this family. 

20 The ERBA oncogene has been assigned to chromosome 17. The ERBA locus remains on 
chromosome 17 in the t(15;17) translocation of acute promyelocytic leukemia (APL). The 
thymidine kinase locus is probably translocated to chromosome 15; study of leukemia with 
t(17;21) and apparently identical breakpoint showed that TK was on 21q+. By in situ hybridization 
of a cloned DNA probe of c-erb-A to meiotic pachytene spreads obtained from uncultured 

25 spermatocytes it has been concluded that ERBA is situated at 17q21.33-17q22, in the same region 
as the break that generated the t(15;17) seen in APL. Because most of the grains were seen in 
17q22, they suggested that ERBA is probably in the proximal region of 17q22 or at the junction 
between 17q22 and 17q21.33. By in situ hybridization it has been demonstrated, that that ERBA 
remains at 17qll-ql2 in APL, whereas TP53, at 17q21-q22, is translocated to chromosome 15. 

30 Thus, ERBA must be at 17ql 1.2 just proximal to the breakpoint in the APL translocation and just 
distal to it in the constitutional translocation. 

The aberrant THRA expression in nonfunctioning pituitary tumors has been hypothesized to reflect 
mutations in the receptor coding and regulatory sequences. They screened THRA mRNA and 
THRB response elements and ligand-binding domains for sequence anomalies. Screening THRA 
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mRNA from 23 tumors by RNAse mismatch and sequencing candidate fragments identified 1 
silent and 3 missense mutations, 2 in the common THRA region and 1 that was specific for the 
alpha-2 isoform. No THRB response element differences were detected in 14 nonfunctioning 
tumors, and no THRB ligand-binding domain differences were detected in 23 nonfunctioning 
5 tumors. Therefore it has been suggested that the novel thyroid receptor mutations may be of 
functional significance in terms of thyroid receptor action, and further definition of their functional 
properties may provide insight into the role of thyroid receptors in growth control in pituitary cells. 

RAR-alpha 

A cDNA encoding a protein that binds retinoic acid with high affinity has been cloned [Petkovich 

10 et al., 1987, (42)]. The protein was found to be homologous to the receptors for steroid hormones, 
thyroid hormones, and vitamin D3, and appeared to be a retinoic acid-inducible transacting 
enhancer factor. Thus, the molecular mechanisms of the effect of vitamin A on embryonic 
development, differentiation and tumor cell growth may be similar to those described for other 
members of this nuclear receptor family. In general, the DNA-binding domain is most highly 

15 conserved, both within and between the 2 groups of receptors (steroid and thyroid); Using a cDNA 
probe, the RAR-alpha gene has been mapped to 17q21 by in situ hybridization [Mattei et al., 1988, 
(43)]. Evidence has been presented for the existence of 2 retinoic acid receptors, RAR-alpha and 
RAR-beta, mapping to chromosome 17q21.1 and 3p24, respectively. The alpha and beta forms of 
RAR were found to be more homologous to the 2 closely related thyroid hormone receptors alpha 

20 and beta, located on 17qll.2 and 3p25-p21, respectively, than to any other members of the nuclear 
receptor family. These observations suggest that the thyroid hormone and retinoic acid receptors 
evolved by gene, and possibly chromosome, duplications from a common ancestor, which itself 
diverged rather early in evolution from the common ancestor of the steroid receptor group of the 
family. They noted that the counterparts of the human RARA and RARB genes are present in both 

25 the mouse and chicken. The involvement of RARA at the APL breakpoint may explain why the 
use of retinoic acid as a therapeutic differentiation agent in the treatment of acute myeloid 
leukemias is limited to APL. Almost all patients with APL have a chromosomal translocation 
t(15;17)(q22;q21). Molecular studies reveal that the translocation results in a chimeric gene 
through fusion between the PML gene on chromosome 15 and the RARA gene on chromosome 17. 

30 A hormone-dependent interaction of the nuclear receptors RARA and RXRA with CLOCK and 
MOP4 has been presented. 

CDC18L t CDC6 



In yeasts, Cdc6 (Saccharomyces cerevisiae) and Cdcl8 (Schizosaccharomyces pombe) associate 
with the origin recognition complex (ORQ proteins to render cells competent for DNA 
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replication. Thus, Cdc6 has a critical regulatory role in the initiation of DNA replication in yeast 
cDNAs encoding Xenopus and human homologues of yeast CDC6 have been isolated [Williams et 
al., 1997, (44)]. They designated the human and Xenopus proteins p62(cdc6). Independently, in a 
yeast 2-hybrid assay using PCNA as bait, cDNAs encoding the human CDC6/Cdcl8 homologue 
5 have been isolated [Saha et al, 1998, (45)]. These authors reported that the predicted 560-amino 
acid human protein shares approximately 33% sequence identity with the 2 yeast proteins. On 
Western blots of HeLa cell extracts, human CDC6/cdcl8 migrates as a 66-kD protein. Although 
Northern blots indicated that CDC6/Cdcl 8 mRNA levels peak at the onset of S phase and diminish 
at the onset of mitosis in HeLa cells, the authors found that total CDC6/Cdcl8 protein level is 

10 unchanged throughout the cell cycle. Immunofluorescent analysis of epitope-tagged protein 
revealed that human CDC6/Cdcl8 is nuclear in Gl- and cytoplasmic in S-phase cells, suggesting 
that DNA replication may be regulated by either the translocation of this protein between the 
nucleus and cytoplasm or by selective degradation of the protein in the nucleus. 
Immunoprecipitation studies showed that human CDC6/Cdcl8 associates in vivo with cyclin A, 

15 CDK2,and ORC1. The association of cyclin-CDK2 with CDC6/Cdcl8 was specifically inhibited 
by a factor present in mitotic cell extracts. Therefore it has been suggested that if the interaction 
between CDC6/Cdcl8 with the S phase-promoting factor cyclin-CDK2 is essential for the 
initiation of DNA replication, the mitotic inhibitor of this interaction could prevent a premature 
interaction until the appropriate time in Gl. Cdc6 is expressed selectively in proliferating but not 

20 quiescent mammalian cells, both in culture and within tissues in intact animals [Yan et al., 1998, 
(46)]. During the transition from a growth-arrested to a proliferative state, transcription of 
mammalian Cdc6 is regulated by E2F proteins, as revealed by a functional analysis of the human 
Cdc6 promoter and by the ability of exogenously expressed E2F proteins to stimulate the 
endogenous Cdc6 gene. Lnmunodepletion of Cdc6 by microinjection of anti-Cdc6 antibody 

25 blocked initiation of DNA replication in a human tumor cell line. The authors concluded that 
expression of human Cdc6 is regulated in response to mitogenic signals through transcriptional 
control mechanisms involving E2F proteins, and that Cdc6 is required for initiation of DNA 
replication in mammalian cells. 

Using a yeast 2-hybrid system, co-purification of recombinant proteins, and immunoprecipitation, 
30 it has been demonstrated lateron that an N-terminal segment of CDC6 binds specifically to PR48, a 
regulatory subunit of protein phosphatase 2A (PP2A). The authors hypothesized that 
dephosphorylation of CDC6 by PP2A, mediated by a specific interaction with PR48 or a related B- 
double prime protein, is a regulatory event controlling initiation of DNA replication in mammalian 
cells. By analysis of somatic cell hybrids and by fluorescence in situ hybridization the human 
35 p62(cdc6) gene has been to 1 7q2 1 .3 . 
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DNA topoisomerases are enzymes that control and alter the topologic states of DNA in both 
prokaryotes and eukaryotes. Topoisomerase II from eukaryotic cells catalyzes the relaxation of 
supercoiled DNA molecules, catenation, decatenation, knotting, and unknotting of circular DNA. It 
5 appears likely that the reaction catalyzed by topoisomerase II involves the crossing-over of 2 DNA 
segments. It has been estimated that there are about 100,000 molecules of topoisomerase II per 
HeLa cell nucleus, constituting about 0.1% of the nuclear extract Since several of the abnormal 
characteristics of ataxia-telangiectasia appear to be due to defects in DNA processing, screening 
for these enzyme activities in 5 AT cell lines has been performed [Singh et al., 1988, (47)]. In 
10 comparison to controls, the level of DNA topoisomerase H, determined by unknotting of P4 phage 
DNA, was reduced substantially in 4 of these cell lines and to a lesser extent in the fifth, DNA 
topoisomerase I, assayed by relaxation of supercoil DNA, was found to be present at normal levels. 

The entire coding sequence of the human TOP2 gene has been determined [Tsai-Pflugfelder et al., 
1988,(48)]. 

15 In addition human cDNAs that had been isolated by screening a cDNA library derived from a 
mechlorethamine-resistant Burkitt lymphoma cell line (Raji-HN2) with a Drosophila Topo II 
cDNA had been sequenced [Chung et al., 1989, (49)]. The authors identified 2 classes of sequence 
representing 2 TOP2 isoenzymes, which have been named TOP2A and TOP2B. The sequence of 1 
of the TOP2A cDNAs is identical to that of an internal fragment of the TOP2 cDNA isolated by 

20 Tsai-Pflugfelder et al., 1988 (48). Southern blot analysis indicated that the TOP2A and TOP2B 
cDNAs are derived from distinct genes. Northern blot analysis using a TOP2A-specific probe 
detected a 6.5-kb transcript in the human cell line U937. Antibodies against a TOP2A peptide 
recognized a 170-kD protein in U937 cell lysates. Therefore it was concluded that their data 
provide genetic and immunochemical evidence for 2 TOP2 isozymes. The complete structures of 

25 the TOP2A and TOP2B genes has been reported [Lang et al., 1998, (50)]. The TOP2A gene spans 
approximately 30 kb and contains 35 exons. 

Tsai-Pflugfelder et al., 1988 (48) showed that the human enzyme is encoded by a single-copy gene 
which they mapped to 17q21-q22 by a combination of in situ hybridization of a cloned fragment to 
metaphase chromosomes and by Southern hybridization analysis with a panel of mouse-human 
30 hybrid cell lines. The assignment to chromosome 17 has been confirmed by the study of somatic 
cell hybrids. Because of co-amplification in an adenocarcinoma cell line, it was concluded that the 
TOP2A and ERBB2 genes may be closely linked on chromosome 17 [Keith et al., 1992, (51)]. 
Using probes that detected RFLPs at both the TOP2A and TOP2B loci, the demonstrated 
heterozygosity at a frequency of 0.17 and 0.37 for the alpha and beta loci, respectively. The mouse 
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homologue was mapped to chromosome 11 [Kingsmore et al., 1993, (52)]. The structure and 
function of type II DNA topoisomerases has been reviewed [Watt et al-, 1994, (53)]. DNA 
topoisomerase II-alpha is associated with the pol II holoenzyme and is a required component of 
chromatin-dependent co-activation. Specific inhibitors of topoisomerase II blocked transcription 
5 on chromatin templates, but did not affect transcription on naked templates. Addition of purified 
topoisomerase II-alpha reconstituted chromatin-dependent activation activity in reactions with core 
pol II. Therefore the transcription on chromatin templates seems to result in the accumulation of 
superhelical tension, making the relaxation activity of topoisomerase II essential for productive 
RNA synthesis on nucleosomal DNA. 

10 IGFBP4 

Six structurally distinct insulin-like growth factor binding proteins have been isolated and their 
cDNAs cloned: IGFBP1, IGFBP2, IGFBP3, IGFBP4, IGFBP5 and IGFBP6. The proteins display 
strong sequence homologies, suggesting that they are encoded by a closely related family of genes. 
The IGFBPs contain 3 structurally distinct domains each comprising approximately one-third of 

15 the molecule. The N-terminal domain 1 and the C-terminal domain 3 of the 6 human IGFBPs show 
moderate to high levels of sequence identity including 12 and 6 invariant cysteine residues in 
domains 1 and 3, respectively (IGFBP6 contains 10 cysteine residues in domain 1), and are 
thought to be the IGF binding domains. Domain 2 is defined primarily by a lack of sequence 
identity among the 6 IGFBPs and by a lack of cysteine residues, though it does contain 2 cysteines 

20 in IGFBP4. Domain 3 is homologous to the thyroglobulin type I repeat unit. Recombinant human 
insulin-like growth factor binding proteins 4, 5, and 6 have been characterized by their expression 
in yeast as fusion proteins with ubiquitin [Kiefer et al., 1992, (54)]. Results of the study suggested 
to the authors that the primary effect of the 3 proteins is the attenuation of IGF activity and 
suggested that they contribute to the control of IGF-mediated cell growth and metabolism. 

25 Moreover, IGFBPs have influence on EGFR and Her-2/neu mediated signaling. Addition of 
IGFBPs to Her-2/neu overexpressing cells at least in part blocks growth and survival 
characteristica of the respective cells. 

Based on peptide sequences of a purified insulin-like growth factor-binding protein (IGFBP) rat 
IGFBP4 has been cloned by using PCR [Shimasaki et al., 1990, (55)]. They used the rat cDNA to 
30 clone the human ortholog from a liver cDNA library. Human IGFBP4 encodes a 258-amino acid 
polypeptide, which includes a 21 -amino acid signal sequence. The protein is very hydrophilic, 
which may facilitate its ability as a carrier protein for the IGFs in blood. Northern blot analysis of 
rat tissues revealed expression in all tissues examined, with highest expression in liver. It was 
stated that IGFBP4 acts as an inhibitor of IGF-induced bone cell proliferation. The genomic region 
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containing the IGFBP gene. The gene consists of 4 exons spanning approximately 15 kb of 
genomic DNA has been examined [Zazzi et al., 1998, (56)]. The upstream region of the gene 
contains a TATA box and a cAMP-responsive promoter. 

By in situ hybridization, the IGFBP4 gene was mapped to 17ql2-q21 [Bajalica et al., 1992, (57)]. 
5 Because the hereditary breast-ovarian cancer gene BRCA1 had been mapped to the same region, it 
has been investigated whether IGFBP4 is a candidate gene by linkage analysis of 22 BRCA1 
families; the finding of genetic recombination suggested that it is not the BRCA1 gene [Tonin et 
al., 1993, (58)]. 

EBI1.CCR7.CMKBR7 

10 Using PCR with degenerate oligonucleotides, a lymphoid-specific member of the G protein- 
coupled receptor family has been identified and mapped mapped to 17ql2-q21.2 by analysis of 
human/mouse somatic cell hybrid DNAs and fluorescence in situ hybridization. It has been shown 
that this receptor had been independently identified as the Epstein-Barr-induced cDNA (symbol 
EBI1) [Birkenbach et al., 1993, (59)]. EBI1 is expressed in normal lymphoid tissues and in several 

15 B- and T-lymphocyte cell lines. While the function and the ligand for EBI1 remains unknown, its 
sequence and gene structure suggest that it is related to receptors that recognize chemoattractants, 
such as interleukin-8, RANTES, C5a, and fMet-Leu-Phe. Like the chemoattractant receptors, EBI1 
contains intervening sequences near its 5-prime end; however, EBI1 is unique in that both of its 
introns interrupt the coding region of the first extracellular domain. Mouse Ebil cDNA has been 

20 isolated and found to encode a protein with 86% identity to the human homologue. 

Subsets of murine CD4+ T cells localize to different areas of the spleen after adoptive transfer. 
Naive and T helper-1 (TH1) cells, which express CCR7, home to the periarteriolar lymphoid 
sheath, whereas activated TH2 cells, which lack CCR7, form rings at the periphery of the T-cell 
zones near B-cell follicles. It has been found that retroviral transduction of TH2 cells with CCR7 
25 forced them to localize in a THl-like pattern and inhibited their participation in B-cell help in vivo 
but not in vitro. Apparently differential expression of chemokine receptors results in unique 
cellular migration patterns that are important for effective immune responses. 

CCR7 expression divides human memory T cells into 2 functionally distinct subsets. CCR7- 
memory cells express receptors for migration to inflamed tissues and display immediate effector 
30 function. In contrast, CCR7 + memory cells express lymph node homing receptors and lack 
immediate effector function, but efficiently stimulate dendritic cells and differentiate into CCRT 
effector cells upon secondary stimulation. The CCR7 + and CCRT T cells, named central memory 
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(T-CM) and effector memory (T-EM), differentiate in a step-wise fashion from naive T cells, 
persist for years after immunization, and allow a division of labor in the memory response. 

CCR7 expression in memory CD8 + T lymphocyte responses to HIV and to cytomegalovirus 
(CMV) tetramers has been evaluated. Most memory T lymphocytes express CD45RO, but a 
5 fraction express instead the CD45RA marker. Flow cytometric analyses of marker expression and 
cell division identified 4 subsets of HIV- and CMV-specific CD8 + T cells, representing a lineage 
differentiation pattern: CD45RA + CCR7 + (double-positive); CD45RA"CCR7 + ; CD45RACCRT 
(double-negative); CD45RA + CCR7\ The capacity for cell division, as measured by 5-(and 
6-)carboxyl-fluorescein diacetate, succinimidyl ester, and intracellular staining for the Ki67 

10 nuclear antigen, is largely confined to the CCR7* subsets and occurred more rapidly in cells that 
are also CD45RA\ Although the double-negative cells did not divide or expand after stimulation, 
they did revert to positivity for either CD45RA or CCR7 or both. The CD45RA + CCRT cells, 
considered to be terminally differentiated, fail to divide, but do produce interferon-gamma and 
express high levels of perforin. The representation of subsets specific for CMV and for HIV is 

15 distinct. Approximately 70% of HIV-specific CD8 + memory T cells are double-negative or 
preterminally differentiated compared to 40% of CMV-specific cells. Approximately 50% of the 
CMV-specific CD8+ memory T cells are terminally differentiated compared to fewer than 10% of 
the HIV-specific cells. It has been proposed that terminally differentiated CMV-specific cells are 
poised to rapidly intervene, while double-positive precursor cells remain for expansion and 

20 replenishment of the effector cell pool. Furthermore, high-dose antigen tolerance and the depletion 
of HIV-specific CD4 + helper T-cell activity may keep the HIV-specific memory CD8 + T cells at 
the double-negative stage, unable to differentiate to the terminal effector state. B lymphocytes 
recirculate between B cell-rich compartments (follicles or B zones) in secondary lymphoid organs, 
surveying for antigen. After antigen binding, B cells move to the boundary of B and T zones to 

25 interact with T-helper cells. Furthermore it has been demonstrated that antigen-engaged B cells 
have increased expression of CCR7, the receptor for the T-zone chemokines CCL19 (also known 
as ELC) and CCL21, and that they exhibit increased responsiveness to both chemoattractants. In 
mice lacking lymphoid CCL19 and CCL21 chemokines, or with B cells that lack CCR7, antigen 
engagement fails to cause movement to the T zone. Using retroviral-mediated gene transfer, the 

30 authors demonstrated that increased expression of CCR7 is sufficient to direct B cells to the T 
zone. Reciprocally, overexpression of CXCR5, the receptor for the B-zone chemokine CXCL13, is 
sufficient to overcome antigen-induced B-cell movement to the T zone. This points toward a 
mechanism of B-cell relocalization in response to antigen, and established that cell position in vivo 
can be determined by the balance of responsiveness to chemoattractants made in separate but 

35 adjacent zones. 
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RAF57.SMARCE1 

The SWI/SNF complex in S. cerevisiae and Drosophila is thought to facilitate transcriptional 
activation of specific genes by antagonizing chromatin-mediated transcriptional repression. The 
complex contains an ATP-dependent nucleosome disruption activity that can lead to enhanced 

5 binding of transcription factors. The BRGl/brm-associated factors, or BAF, complex in mammals 
is functionally related to SWSNF and consists of 9 to 12 subunits, some of which are 
homologous to SWI/SNF subunits. A 57-kD BAF subunit, BAF57, is present in higher eukaiyotes, 
but not in yeast. Partial coding sequence has been obtained from purified BAF57 from extracts of 
a human cell line [Wang et al., 1998, (60)]. Based on the peptide sequences, they identified 

10 cDNAs encoding BAF57. The predicted 411-amino acid protein contains an HMG domain 
adjacent to a kinesin-like region. Both recombinant BAF57 and the whole BAF complex bind 4- 
way junction (4WJ) DNA, which is thought to mimic the topology of DNA as it enters or exits the 
nucleosome. The BAF57 DNA-binding activity has characteristics similar to those of other HMG 
proteins. It was found that complexes with mutations in the BAF57 HMG domain retain their 

15 DNA-binding and nucleosome-disruption activities. They suggested that the mechanism by which 
mammalian SWI/SNF-like complexes interact with chromatin may involve recognition of higher- 
order chromatin structure by 2 or more DNA-binding domains. RNase protection studies and 
Western blot analysis revealed that BAF57 is expressed ubiquitously. Several lines of evidence 
point toward the involvement of SWI/SNF factors in cancer development [Klochendler-Yeivin et 

20 al., 2002, (61)]. Moreover, SWI/SNF related genes are assigned to chromosomal regions that are 
frequently involved in somatic rearrangements in human cancers [Ring et al., 1998, (62)]. In this 
respect it is interesting that some of the SWI/SNF family members (i.e. SMARCC1, SMARCC2, 
SMARCD1 and SMARCD22 are neighboring 3 of the eucaryotic ARCHEONs we have identified 
(i.e. 3p21-p24, 12ql3-ql4 and 17q respectively)and which are part of the present invention. In this 

25 invention we could also map SMARCE1/BAF57 to the 17ql2 region by PCR karyotyping. 

KRT10.K10 

Keratin 10 is an intermediate filament (IF) chain which belongs to the acidic type I family and is 
expressed in terminally differentiated epidermal cells. Epithelial cells almost always co-express 
pairs of type I and type II keratins, and the pairs that are co-expressed are highly characteristic of a 
30 given epithelial tissue. For example, in human epidermis, 3 different pairs of keratins are 
expressed: keratins 5 (type IT) and 14 (type 1), characteristic of basal or proliferative cells; keratins 
1 (type II) and 10 (type I), characteristic of superbasal terminally differentiating cells; and keratins 
. 6 (type H) and 16 (type I) (and keratin 17 [type I]), characteristic of cells induced to hyper- 
proliferate by disease or injury, and epithelial cells grown in cell culture. The nucleotide sequence 
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of a 1,700 bp cDNA encoding human epidermal keratin 10 (56.5 kD) [Dannon et al., 1987, (63)] 
has been published as well as the complete amino acid sequence of human keratin 10 [Zhou et al., 
1988, (64)]. Polymorphism of the KRT10 gene, restricted to insertions and deletions of the 
glycine-richquasipeptide repeats that form fee glycine-loop motif in the C-terminal domain, have 
5 been extensively described [Korge et al., 1992, (65)]. 

By use of specific cDNA clones in conjunction with somatic cell hybrid analysis and in situ 
hybridization, KRT10 gene has been mapped to 17ql2-q21 in a region proximal to the breakpoint 
at 17q21 that is involved in a t(17;21)(q21;q22) translocation associated with a form of acute 
leukemia. KRT10 appeared to be telomeric to 3 other loci that map in the same region: CSF3, 

10 ERBA1, and HER2 [Lessin et al., 1988, (66)]. NGFR and HOX2 are distal to K9. It has been 
demonstrated that the KRT10, KRT13, and KRT15 genes are located in the same large pulsed field 
gel electrophoresis fragment [Romano et al., 1991, (67)]. A correlation of assignments of the 3 
genes makes 17q21-q22 the likely location of the cluster. Transgenic mice expressing a mutant 
keratin 10 gene have the phenotype of epidermolytic hyperkeratosis , thus suggesting that a genetic 

15 basis for the human disorder resides in mutations in genes encoding suprabasal keratins KRT1 or 
KRT10 [Fuchs et al 1992, (68)]. The authors also showed that stimulation of basal cell 
proliferation can result from a defect in suprabasal cells and that distortion of nuclear shape or 
alterations in cytokinesis can occur when an intermediate filament network is perturbed. In a 
family with keratosis palmaris et plantaris without blistering either spontaneously or in response to 

20 mild mechanical or thermal stress and with no involvement of the skin and parts of the body other 
than the palms and soles, a tight linkage to an insertion-deletion polymorphism in the C-terminal 
coding region of the KRT10 gene (maximum lod score = 8.36 at theta = 0.00) was found [Rogaev 
et al., 1993, (69)]. It is noteworthy that it was a rare, high molecular weight allele of the KJRT10 
polymorphism that segregated with the disorder. The allele was observed once in 96 independent 

25 chromosomes from unaffected Caucasians. The KRT10 polymorphism arose from the 
insertion/deletion of imperfect (CCG)n repeats within the coding region and gave rise to a variable 
glycine loop motif in the C-terminal tail of the keratin 10 protein. It is possible that there was a 
pathogenic role for the expansion of the imperfect trinucleotide repeat. 

KRT12X12 

30 Keratins are a group of water-insoluble proteins that form 10 ran intermediate filaments in 
epithelial cells. Approximately 30 different keratin molecules have been identified. They can be 
divided into acidic and basic-neutral subfamilies according to their relative charges, 
immunoreactivity, and sequence homologies to types I and II wool keratins, respectively, hi vivo, a 
basic keratin usually is co-expressed and •paired 1 with a particular acidic keratin to form a 
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heterodimer. The expression of various keratin pairs is tissue specific, differentiation dependent, 
and developmentally regulated. The presence of specific keratin pairs is essential for the 
maintenance of the integrity of epithelium. For example, mutations in human K14/K5 pair and the 
K10/K1 pair underlie the skin diseases, epidermolysis bullosa simplex and epidermolytic 
5 hyperkeratosis, respectively. Expression of the K3 and K12 keratin pair have been found in the 
cornea of a wide number of species, including human, mouse, and chicken, and is regarded as a 
marker for corneal-type epithelial differentiation. The murine Krtl2 (Krtl.12) gene and 
demonstrated that its expression is corneal epithelial cell specific, differentiation dependent, and 
developmentally regulated [Liu et al., 1993, (70)]. The corneal-specific nature of keratin 12 gene 

10 expression signifies keratin 12 plays a unique role in maintaining normal corneal epithelial 
function. Nevertheless, the exact function of keratin 12 remains unknown and no hereditary human 
corneal epithelial disorder has been linked directly to the mutation in the keratin 12 gene. As part 
of a study of the expression profile of human corneal epithelial cells, a cDNA with an open 
reading frame highly homologous to the cornea-specific mouse keratin 12 gene has been isolated 

15 [Nishida et al., 1996, (71)]. To elucidate the function of keratin 12 knockout mice lacking the 
Krtl.12 gene have been created by gene targeting techniques. The heterozygous mice appeared 
normal. Homozygous mice developed normally and suffered mild corneal epithelial erosion. The 
corneal epithelia were fragile and could be removed by gentle rubbing of the eyes or brushing. The 
corneal epithelium of the homozygotes did not express keratin 12 as judged by 

20 immunohistochemistry, Western immunoblot analysis with epi tope-specific anti-keratin 12 
antibodies, Northern hybridization, and in situ hybridization with an antisense keratin 12 
riboprobe. The KRT12 gene has been mapped to 17q by study of radiation hybrids and localized it 
to the type I keratin cluster in the interval between D17S800 and D17S930 (17ql2-q21) [Nishida 
et al., 1997, (72)]. The authors presented the exon-intron boundary structure of the KRT12 gene 

25 and mapped the gene to 17ql2 by fluorescence in situ hybridization. The gene contains 7 introns, 
defining 8 exons that cover the coding sequence. Together the exons and introns span 
approximately 6 kb of genomic DNA. 

Meesmann corneal dystrophy is an autosomal dominant disorder causing fragility of the anterior 
corneal epithelium, where the cornea-specific keratins K3 and K12 are expressed. Dominant- 

30 negative mutations in these keratins might be the cause of Meesmann corneal dystrophy. Indeed, 
linkage of the disorder to the K12 locus in Meesmann's original German kindred [Meesmann and 
Wilke, 1939, (73)] with Z(max) - 7.53 at theta = 0.0 has been found. In 2 pedigrees from Northern 
Ireland, they found that the disorder co-segregated with K12 in one pedigree and K3 in the other. 
Heterozygous missense mutations in K3 or in K12 (R135T, V143L,) in each family have been 

35 identified. All these mutations occurred in highly conserved keratin helix boundary motifs, where 
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dominant mutations in other keratins have been found to compromise cytoskeletal function 
severely, leading to keratinocyte fragility. 

The regions of the human KRT12 gene have been sequenced to enable mutation detection for all 
exons using genomic DNA as a template [Corden et al., 2000, (74)]. The authors found that the 
5 human genomic sequence spans 5,919 bp and consists of 8 exons. A microsatellite dinucleotide 
repeat was identified within intron 3, which was highly polymorphic and which they developed for 
use in genotype analysis. La addition, 2 mutations in the helix initiation motif of K12 were found in 
families with Meesmann corneal dystrophy. In an American kindred, a missense M129T mutation 
was found in the KRT12 gene. They stated that a total of 8 mutations in the KRT12 gene had been 
10 reported. 

Genetic interactions within ARCHEONs 

Genes involved in genomic alterations (amplifications, insertions, translocations, deletions, etc.) 
exhibit changes in their expression pattern. Of particular interest are gene amplifications, which 
account for gene copy numbers >2 per cell or deletions accounting for gene copy numbers <2 per 

15 cell. Gene copy number and gene expression of the respective genes do not necessarily correlate. 
Transcriptional overexpression needs an intact transcriptional context, as determined by regulatory 
regions at the chromosomal locus (promotor, enhancer and silencer), and sufficient amounts of 
transcriptional regulators being present in effective combinations. This is especially true for 
genomic regions, which expression is tightly regulated in specific tissues or during specific 

20 developmental stages. ARCHEONs are specified by gene clusters of more than two genes being 
directly neighboured or in chromosomal order, interspersed by a maximum of 10, preferably 7, 
more preferably 5 or at least 1 gene. The interspersed genes are also co-amplified but do not 
directly interact with the ARCHEON. Such an ARCHEON may spread over a chromosomal region 
of a maximum of 20, more preferably 10 or at least 6 Megabases. The nature of an ARCHEON is 

25 characterized by the simultaneous amplification and/or deletion and the correlating expression (i.e. 
upregulation or downregulation respectively) of the encompassed genes in a specific tissue, cell 
type, cellular or developmental state or time point. Such ARCHEONs are commonly conserved 
during evolution, as they play critical roles during cellular development. In case of these 
ARCHEONs whole gene clusters are overexpressed upon amplification as they harbor self- 

30 regulatory feedback loops, which stabilize gene expression and/or biological effector function even 
in abnormal biological settings, or are regulated by very similar transcription factor combinations, 
reflecting their simultaneous function in specific tissues at certain developmental stages. 
Therefore, the gene copy numbers correlates with the expression level especially for genes in gene 
clusters functioning as ARCHEONs. In case of abnormal gene expressions in neoplastic lesions it 
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is of great importance to know whether the self-regulatory feedback loops have been conserved as 
they determine the biological activity of the ARCHEON gene members. 

The intensive interaction between genes in ARCHEONs is described for the 17q21 ARCHEON 
(Fig. 1) by way of illustration not by limitation. In one embodiment the presence or absence of 
5 alterations of genes within distinct genomic regions are correlated with each other, as exemplified 
for breast cancer cell lines (Fig.3 and Fig. 4). This confers to the discovery of the present 
invention, that multiple interactions of said gene products of defined chromosomal localizations 
happen, that according to their respective alterations in abnormal tissue have predictive, 
diagnostic, prognostic and/or preventive and therapeutic value. These interactions are mediated 

10 directly or indirectly, due to the fact that the respective genes are part of interconnected or 
independent signaling networks or regulate cellular behavior (differentiation status, proliferative 
and /or apoptotic capacity, invasiveness, drug responsiveness, immune modulatory activities) in a 
synergistic, antagonistic or independent fashion. The order of functionally important genes within 
the ARCHEONs has been conserved during evolution (e.g. the ARCHEON on human chromosom 

15 17q21 is present on mouse chromosome 11). Moreover, it has been found that the 17q21 
ARCHEON is also present on human chromosome 3p21 and 12ql3, both of which are also 
involved in amplification events and in tumor development. Most probably these homologous 
ARCHEONs were formed by duplications and rearrangements during vertebrate evolution. 
Homologous ARCHEONs consist of homologous genes and/or isoforms of specific gene families 

20 (e.g. RARA or RARB or RARG, THRA or THRB, TOP2A or TOP2B, RAB5A or RAB5B, 
BAF170 or BAF 155, BAF60A or BAF60B, WNT5A or WNT5B, IGFBP4 or IGFBP6). Moreover 
these regions are flanked by homologous chromosomal gene clusters (e.g. CACN, SCYA, HOX, 
Keratins). These ARCHEONs have diverged during evolution to fulfill their respective functions 
in distinct tissues (e.g. the 17q21 ARCHEON has one of its main functions in the central nervous 

25 system). Due to their tissue specific function extensive regulatory loops control the expression of 
the members of each ARCHEON. During tumor development these regulations become critical for 
the characteristics of the abnormal tissues with respect to differentiation, proliferation, drug 
responsiveness, invasiveness. It has been found that the co-amplification of genes within 
ARCHEONs can lead to co-expression of the respective gene products. Some of said genes also 

30 exhibit additional mutations or specific patterns of polymorphisms, which are substantial for the 
oncogenic capacities of these ARCHEONs. It is one of the critical features of such amplicons, 
which members of the ARCHEON have been conserved during tumor formation (e.g. during 
amplification and deletion events)", thereby defining these genes as diagnostic marker genes. 
Moreover, the expression of the certain genes within the ARCHEON can be influenced by other 

35 members of the ARCHEON, thereby defining the regulatory and regulated genes as target genes 
for therapeutic intervention. It was also observed, that the expression of certain members of the 
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ARCHEON is sensitive to drug treatment (e.g. TOP02 alpha, RARA, THRA, HER-2) which 
defines these genes as "marker genes". Moreover several other genes are suitable for therapeutic 
intervention by antibodies (CACNB1, EBI1), ligands (CACNB1) or drugs like e.g. kinase 
inhibitors (CrkRS, CDC6). The following examples of interactions between members of 
5 ARCHEONs are offered by way of illustration, not by way of limitation. 

EBI1/CCR7 is lymphoid-specific member of the G protein-coupled receptor family. EBI1 
recognizes chemoattractants, such as interleukin-8, SCYAs, Rantes, C5a, and fMet-Leu-Phe. The 
capacity for cell division is largely confined to the CCR7 + subsets in lymphocytes. Double- 
negative cells did not divide or expand after stimulation. CCR7 cells, considered to be terminally 

10 differentiated, fail to divide, but do produce interferon-gamma and express high levels of perforin. 
EBI1 is induced by viral activities such as the Eppstein-Barr-Virus. Therefore, EBI1 is associated 
with transformation events in lymphocytes. A functional role of EBI1 during tumor formation in 
non-lymphoid tissues has been investigated in this invention. Interestingly, also ERBA and ERBB, 
located in the same genomic region, are associated with lymphocyte transformation. Moreover, 

15 ligands of the receptor (i.e. SCYAS/Rantes) are in genomic proximity on 17q. Abnormal 
expression of both of these factors in lymphoid and non-lymphoid tissues establishes an 
autorgulatoiy feedback loop, inducing signaling events within the respective cells. Expression of 
lymphoid factors has effect on immune cells and modulates cellular behavior. This is of particular 
interest with regard to abnormal breast tissue being infiltrated by lymphocytes. In line with this, 

20 another immunmodulatory and proliferation factor is located nearby on 17q21. Granulocyte 
colony-stimulating factor (GCSF3) specifically stimulates the proliferation and differentiation of 
the progenitor cells for granulocytes. A stimulatory activity from a glioblastoma multiforme cell 
line being biologically and biochemically indistinguishable from GCSF produced by a bladder cell 
line has also been found. Colony-stimulating factors not only affects immune cells, but also induce 

25 cellular responses of non-immune cells, indicating possible involvement in tumor development 
upon abnormal expression. In addition several other genes of the 17q21 ARCHEON are involved 
in proliferation, survival, differentiation of immune cells and/or lymphoblastic leukemia, such as 
MLLT6, ZNF144 and ZNFN1A3, again demonstrating the related functions of the gene products 
in interconnected key processes within specific cell types. Aberrant expression of more than one of 

30 these genes in non-immune cells constitutes signalling activities, that contribute to the oncogenic 
activities that derive solely from overexpression of the Her-2/neu gene. 

PPARBP has been found in complex with the tumorsuppressor gene of the p53 family. Moreover, 
PPARBP also binds to PPAR-alpha (PPARA), RAR-alpha (RARA), RXR, THRA and TR-beta-1. 
Due to it's ability to bind to thyroid hormone receptors it has been named TRJP2 and TRAP220. In 
35 this complexes PPARBP affects gene regulatory activities. Interestingly, PPARB? is located in 
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genomic proximity to its interaction partners THRA and RARA. We have found PPARBP to be 
co-amplified with THRA and RARA in tumor tissue. THRA has been isolated from avian 
erythroblastosis virus in conjunction with ERBB and therefore was named ERBA. ERBA 
potentiates ERBB by blocking differentiation of erythroblasts at an immature stage. ERBA has 
5 been shown to influence ERBB expression. In this setting deletions of C-terminal portions of the 
THRA gene product are of influence. Aberrant THRA expression has also been found in 
nonfunctioning pituitary tumors, which has been hypothesized to reflect mutations in the receptor 
coding and regulatory sequences. THRA function promotes tumor cell development by regulating 
gene expression of regulatory genes and by influencing metabolic activities (e.g. of key enzymes 

10 of alternative metabolic pathways in tumors such as malic enzyme and genes responsible for 
lipogenesis). The observed activities of nuclear receptors not only reflect their transactivating 
potential, but are also due to posttranscriptional activities in the absence or presence of ligands. 
Co-amplification of THRA /ERBA and ERBB has been shown, but its influence on tumor 
development has been doubted as no overexpression could be demonstrated in breast tumors [van 

15 de Vijver et al., 1987, (75)]. THRA and RARA are part of nuclear receptor family whose function 
can be mediated as monomers, homodhners or heterodimers. RARA regulates differentiation of a 
broad spectrum of cells. Interactions of hormones with ERBB expression has been investigated. 
Ligands of RARA can inhibit the expression of amplified ERBB genes in breast tumors 
[Offterdinger et al., 1998, (76)]. As being part of this invention co-amplification and co-expression of 

20 THRA and RARA could be shown. It was also found that multiple genes, which are regulated by 
members of the thyroid hormone receptor - and retinoic acid receptor family, are differentially 
expressed in tumor samples, corresponding to their genomic alterations (amplification, mutation, 
deletion). These hormone receptor genes and respective target genes are useful to discriminate 
patient samples with respect to clinical features. 

25 By expression analysis of multiple normal tissues, tumor samples and tumor cell lines and 
subsequent clustering of the 17q21 region, it was found that the expression profile of Her-2/neu 
positive tumor cells and tumor samples exhibits similarities with the expression pattern of tissue 
from the central nervous system (Fig. 2). This is in line with the observed malformations in the 
central nervous system of Her-2/neu and THRA knock-out mice. Moreover, it was found that 

30 NEUROD2, a nuclear factor involved specifically in neurogenesis, is commonly expressed in the 
respective samples. This led to the definition of the 17q21 Locus as being an "ARCHEON", 
whose primary function in normal organ development is defined to the central nervous system. 
Surprisingly, the expression of NEUROD2 was affected by therapeutic intervention. Strikingly, 
also ZNF144, TEM7, PIP5K and PPP1R1B are expressed in neuronal cells, where they display 

35 diverse tissue specific functions. 
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. In addition Her-2/neu is often co-amplified with GRB7, a downstream member of the signaling 
cascade being involved in invasive properties of tumors. Surprisingly, we have found another 
member of the Her-2/neu signaling cascade being overexpressed in primary breast tumors TOB1 
(= 'Transducer of ERBB signaling"). Strong overexpression of TOB1 corellated with weaker 
5 overexpression of Her-2/neu, already indicating its involvement in oncogenic signaling activities. 
Amplification of Her-2/neu has been assigned to enhanced proliferative capacity, due to the 
identified downstream components of the signaling cascade (e.g. Ras-Raf-MAPK). In this respect 
it was surprising that some cdc genes, which are cell cycle dependent kinases, are part of the 
amplicons, which upon altered expression have great impact on cell cycle progression. 

10 The ARCHEONS on 17q21 and 12ql3 are very closely related, as they do not only harbor 
isoforms of specific genes (e.g. CACNB1 vs. CACNB3, ERBB2 vs. ERBB3, KARA vs. RARG, 
see below), but are even flanked by whole gene clusters, consisting of multiple isoforms of one 
gene family positioned in tandem, such as the keratin and the HOX gene cluster, hi this respect the 
simultaneous presence of keratins and receptors of the EGFR family, i.e. ERBB2 (Her-2/neu) and 

15 ERBB3 (Her-3) is of special interest, as the expression of individual keratins is very tightly 
controlled by the EGFR signalling pathway. 

Keratins are a group of water-insoluble proteins that form 10 nm intermediate filaments in 
epithelial cells. Approximately 30 different keratin molecules have been identified. They can be 
divided into acidic and basic-neutral subfamilies according to their relative charges, 

20 immunoreactivity, and sequence homologies to types I and II wool keratins, respectively. In vivo, a 
basic keratin usually is co-expressed and paired' with a particular acidic keratin to form a 
heterodimer. The expression of various keratin pairs is tissue specific, differentiation dependent, 
and developmentally regulated. The presence of specific keratin pairs is essential for the 
maintenance of the integrity of epithelium. Alterations of keratin expression have been observed in 

25 tumor epithelium, with an abnormal keratin pattern being expressed in tumor cells compared to the 
adjacent normal tissue. Mutations in human K14/K5 pair and the K10/K1 pair underlie skin 
diseases such as epidermolysis bullosa simplex and epidermolytic hyperkeratosis. The expression 
of these and other keratins within the skin is tightly regulated. For example, the expression of 
K14/K5 pair is restricted to the basal cell layer of the skin displaying no overlap with the K10/K1 

30 pair, which is solely expressed in the suprabasal layer. Gene expression is very tightly controlled 
by an interplay of multiple signalling cascades such as the EGFR, TGFR, sonic hedgehog and wnt- 
signaling, involving receptor tyrosine kinases and serin threonin kinases. In addition, 
posttranslational modifications such as serine/threonine and/or tyrosine phosphorylation events 
affect keratin function, and can be attributed to receptor tyrosine kinase signalling and MAPK and 

35 ERK activity. Posttranlational modifications of keratins not only alters the solubility of keratins, 
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but also affects nuclear and signalling functions (e.g. after association with 14-3-3 protein). Li 
addition, we did observe genomic alteration of the keratin gene clusters perturbing keratin 
expression pattern. 

Moreover, the physical interaction of keratins, which are located in ARCHEONs of different 
5 chromosomes' and whose cell type specific expression at distinct differentiation status is regulated 
by members of the same ARCHEONs is a superb example of the genetic interaction of ARCHEON 
genes. Examples of this tight interaction between the 12ql3 and 17q21 ARCHEONs are the 
expression and physical interaction of keratin 5 (basic keratin Type II located on 12ql3) and 
keratin 14 (acidic keratin Type I located on 17q21) in the basal layer of the skin, which is shut off 

10 in the suprabasal layer and compensated by the expression and physical interaction of keratin 1 
(basic keratin Type II located on 12ql3) and keratin 10 (acidic keratin Type I located on 17q21). 
Diverse control mechanisms confer this exclusive expression control including chromosomal 
positioning and growth factor signaling activities. Interestingly, critical keratins are 
chromosomally postioned in an ordered fashion reflecting their related but exclusive function in 

15 different keratin pairs and in specific tissues, resembling the structure and function relationship of 
the hox gene clusters on the same chromosomes. Moreover, keratins whose mutation result in 
specific skin disorders (e.g* mutation of K5 and K14 results in hand and foot syndrom) are located 
at similiar positions within the ARCHEONs on chromosome 17q21 and 12ql3. The genes are in 
close proximity to genes involved in signaling events (e.g. ERBBs and RARs) regulating 

20 proliferation, differentiation and apoptotic events also in the skin tissue. For example Her-2/neu is 
specifically expressed within the basal layer of the skin, where assymmetric cell divisions of adult 
stem cells or ealry progenitor cells thereof give rise to a non-differentiated daughter cell residing 
in the basal layer and a differentiating daughter cell which is subsequently moving to the 
suprabasal compartment. These assymetric cell divisions guarantee the self-renewal and the 

25 cellular homeostasis of the skin tissue. This is of importance for the biological functions of the 
skin such as barrier function towards the environmental stress including infectious agents. 
Perturbation of the signalling activities within the skin results in diseases similiar to the hereditary 
disorders reflecting mutations of specific keratin genes. In clinical studies it has been shown, that 
blocking EGFR signalling by antibody-treatment (e.g. cetuximab) and small molecule inhibitors 

30 (e.g. Iressa) targeted to the receptor tyrosin kinases can result in skin diseases (e.g. acne-like rash) 
of grade I, II or IE. It is part of this invention, that these skin diseases not only reflect side effects 
of the respective treatments, but are an example for systemic changes occuring as a consequence of 
therapeutic regimen, thereby indicating suscebility of the endogenous signaling network to the 
therapeutic agents. This observation can have consequences on therapeutic decisions, as the 

35 therapeutic regimen are normally stopped or is reduced upon occurence of side effects. However, 
as the side effects (e.g. the skin dieseases occurring under anti growth factor treatment) are 



WO 2005/047534 



-47- 



PCT7EP2004/011599 



indicative of response to treatment (e.g. tumor shrinkage), the treatment should be endured even 
though "adverse" drug responses occur and side effects should be treated separately by agents 
softening the symptoms. Skin diseases such as rash and hand and foot syndrom are just examples 
for a given side effect under a given treatment (i.e. anti tumor therapy), that can be used for 
5 response correlation. 

Similarly to blocking receptor molecules itself, blocking downstream members of these signaling 
cascades results mainly in skin diseases (e.g. hand-and-foot syndroms). Surprisingly, we did 
observe, that treating tumor cells with agents blocking the EGFR/Her-2/neu signaling (e.g. 
Cetuximab, Iressa, Herceptin, RAF kinase inhibitor, etc.) shifts the expression of specific keratins 

10 being part of the ARCHEONS described in this invention. Moreover, the altered expression of 
keratins in tumor cells of patients is paralleled by a shift of keratin expression in the keratinocytes 
of the skin of the very same patient. Perturbation of keratin expression and or post-transcriptional 
modification in the skin tissue seems to resemble the suscebility of the endogenous growth factor 
signaling pathways to the respective treatment. The resulting skin diseases are therefore at least to 

15 some extent indicative of the tumor responsiveness to the regimen. This endogenous 
responsiveness to anti growth factor signaling agents can also be delineated from polymorphisms 
and genetic alterations (e.g. mutations) being present within the ARCHEON described in this 
invention. Of particular interest are in this context polymorphisms being present in the keratin 
genes. However, polymorphisms within keratins, keratin related genes and/or genes functionally 

20 connected to the keratin-based cytoskeleton, not necessarily being present within the ARCHEONs 
described, are also of importance according to their physical interaction with the respective gene 
products (e.g. ITGB4). It is part of this invention, that the responsiveness of a given tumor to anti 
growth factor treatment relates to the genetic predisposition of the respective signaling pathway 
members and target genes, which include keratins and related genes, that are markers for 

25 proliferation, differentiation and apoptosis in normal tissues, such as skin tissue. This knowledge 
can be used to predict the responsiveness of a tumor based on fee characterization of surrogate 
tissues, such as skin, blood and any other normal tissue containing the above mentioned genes 
and/or gene products. For example the responsiveness to Iressa, RAF-kinase inhibitor and antibody 
based therapies targeting EGFR and Her-2/neu can be delineated from punch biopsies of the skin 

30 (preferably by comparison of pre- and/or post-treatment samples) or blood samples by determining 
the expression pattern or genetic characterization of keratin or keratin-related genes of an 
individual patient. Moreover, the responsiveness of such surrogate tissues can then be correlated to 
the tumor phenotype and the responsiveness of a tumor to the respective treatment, thereby predict 
therapy outcome. The examples of surrogate tissues are given by way of illustration and not by 

35 limitation. 
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It is yet another embodiment of the invention, that adverse drug responses such as heart toxicities 
can also be deduced from characteristica of the ARCHEONs described. Of particular interest are 
the ARCEONs at 17ql2-24, 12ql3 and 3q21-26. It is known that anthracyclin based, anti-cancer 
regimens result in heart toxicities (such as dilated cardiomyopathies), as can be deduced e.g. by 
5 LVEF measurements. Moreover, anthracyclin pretreated patients have significantly increased heart 
toxicity events upon subsequent Herceptin™ based regimen. Interestingly, the ARCHEONs 
described in this invention not only harbor the primary targets of these therapies (i.e. 
topoisomerases and Her-2/neu), but also important structural and functional genes (Telethonin, 
PNMT, CACNB1, PPARBP, Her-2/neu, Her4) for muscle function including heart muscle 

10 function. These genes are involved in central processes of heart muscle function, such as tyrosine 
phosphorylation, serine/threonine phosphorylation, calcium influx, regulating e.g. central 
structural proteins such as titin. Moreover, these genes can be colocalized in heart muscles, 
dispaying their functional interplay in this tissue. In mouse models, the mislocalization of 
telethonin and the genetic inactivation of Her-2/neu, Her4 and Neuregulin result in a similiar 

15 phenotype as can be seen for cancer patients being treated with diverse anti-cancer drugs. The 
synergistic adverse drug response effect seen for the combinatorial treatment with anthracyclin and 
Herceptin™. Delineation of polymorphisms and haplotypes of the respective genes, genomic 
region and/or the ARCHEON structure are indicative of the susceptibility to suffer from heart 
toxicities upon anti-cancer drug treatment. This is important for therapy decisions and cancer 

20 treatment management, as the prior therapies conducted exclude subsequent treatment options. For 
example, anthracyclin-based pretreatment can exclude subsequent Herceptin™ treatment or lead to 
reduced dosages, if possible heart toxicities (e.g. dilated cardiomyopathies) cannot be excluded. 

According to the observations described above the following examples of genes at 3q21-26 are 
offered by way of illustration, not by way of limitation. 

25 * WNT5A, CACNA1D, THRB, RARB, TOP2B, RAB5B, SMARCC1 (BAF155), RAF, 
WNT7A 

The following examples of genes at 12ql3 are offered by way of illustration, not by way of 
limitation. 

* CACNB3, Keratins, ERBB3, NR4A1, RAB5/13, RARG, STAT6, WNT10B, (GCN5), (SAS: 
30 Sarcoma Amplified Sequence), SMARCC2 (BAF170), SMARCD1 (BAF60A), (GAS41: 

Glioma Amplified Sequence), (CHOP), Her3, KRTHB, HOX C , IGFBP6, WNT5B 

There is cross-talk between the amplified ARCHEONs described above and some other highly 
amplified genomic regions locate approximately at lpl3, lq32, 2pl6, 2q21, 3pl2, 5pl3, 6pl2, 
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7pl2, 7q21, 8q23, llql3, 13ql2, 19ql3, 20ql3 and 21qll. The above mentioned chromosomal 
regions are described by way of illustration not by way of limitation, as the amplified regions often 
span larger and/or overlapping positions at these chromosomal positions. 

Additional alterations of non-transcribed genes, pseudogenes or intergenic regions of said 
5 chromosomal locations can be measured for prediction, diagnosis, prognosis, prevention and 
treatment of malignant neoplasia and breast cancer in particular. Some of the genes or genomic 
regions have no direct influence on the members of the ARCHEONs or the genes within distinct 
chromosomal regions but still retain marker gene function due to their chromosomal positioning in 
the neighborhood of functionally critical genes (e.g. Telethonin neighboring the Her-2/neu gene). 

10 Clinical Relevance of the genes which are part of the 17a21 Archeon for Response to 
Herceptin treatment 

Clinical Samples of patients being treated with Herceptin, docetaxel, paclitaxel, taxotere, 
carboplatin, cisplatin, oxaliplatin, vinorelbine, tamoxifen, anastrozole, letrozole, tamoxifen, 
epirubicin, doxorubicin and CMF were obtained. Primary tumor tissues and lymphnode tissues 
15 were obtained from neoadjuvant and adjuvant settings. In addition, biopsy material of first and 
second line therapies was obtained in some cases from metastatic lesions. These samples included 
formalin-fixed and paraffin-embedded material or fresh tissue from primary tumours and 
metastatic lesions of the respective patients. Moreover, whole blood, serum and plasma samples 
were included in the analysis. 

20 Multiparametric, clinical assessment of the response to Herceptin in combination with 
chemotherapeutics (e.g. docetaxel, taxotere, paclitaxel, vinorelbine, carboplatin, cisplatin), or other 
therapies described below, was performed, based on clinical information, such as histological 
parameters (TNM-Stage, AJCC grade), standard molecular markers (IHC staining for estrogen 
receptor, progesteron receptor, Her-2/neu) and sonographical or radiological assessment (e.g. CT). 

25 In addition to combinatorial treatment, samples from single agent therapies were evaluated. 
Response to treatment was evaluated according to international standards. The ARCHEON genes 
were analyzed on DNA, RNA or protein level. Normalization of the ARCHEON genes was done 
by intra- or extrachromosomal reference genes (see EXAMPLE 3 below) or by housekeeping 
genes of diverse expression level. 

30 We could delineate specific regions of the ARCHEON to be informative for the response to a 
Herceptin-based therapy. As depicted below, genes that are located towards the centromer or 
telomer of an individual chromosome in relation to a centrally localized gene within an 
ARCHEON (e.g. Her-2/neu in the 17q21 ARCHEON) are in the following named to be 
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„centromeric" and „telomeric", respectively. Of particular interest for response to Herceptin-based 
treatment are genes being centromeric from the Her-2/neu gene locus on 17q21. The integrity of 
this centromeric ARCHEON region is of importance for the phenotype of Her-2/neu positive 
tumors. Genetic alteration in the chromosomal region of PIP5K2B, FLJ20291, MLN50, TEM7, 
5 CACNB1, RPL19, MGC15482, PPAEBP, CrkRS are critical for clinical outcome of Her-2/neu 
positive breast tumors. Of particular interest is the centromeric breakpoint region of the 17q21 
ARCHEON nearby the genes TEM7, CACNB1, CrkRS and PPARBP. Her-2/neu positive tumors 
bearing elevated gene copy numbers of TEM7, CACNB1, CrkRS and PPARBP compared to other 
Her-2/neu positive tumors and/or normal tissue controls do have a worsened clinical outcome and 

10 a poor response to Herceptin based treatment The genes within this region are involved in calcium 
and inositol signalling, which is fundamental with regard to cell survival mechanisms (e.g. 
CACNB1, PPP1R1B and PIP5K2B). Overexpression of CrkRS is of importance for the tumor 
phenotype, as its kinase activity regultaes the RNA polymerase n holoenzyme complex. Especially 
the phosphorylation of the C-terminal domain and its associated components not only has 

15 influence on the general activity of the enzyme complex, but also affects gene products, whose 
importance for tumor cell growth has been demonstrated and some of which are part of the 
ARCHEONs (e.g. the SWI/SNF components SMARCs, e.g. SMARCC2, are critical for RB 
mediated tumor suppression). Phosphorylation of SMARCs is tightly regulated during cell cycle 
progression and affects the biological. function of the SMARCs (influence on activity, stability 

20 and cellular localization). Altered phosphorylation of the RNA polymerase holoenzyme complex 
by CrkRS therefore most probably affects cell cycle progression. Moreover, the abnormal 
expression of TEM7, which we found to be elevated in a subclass of Her-2/neu positive tumors, 
whereas it was originally identified to be a tumor endothelial marker (TEM; see above), points 
towards an intense interplay between tumor and endothelial cells resulting in a more aggressive 

25 behaviour of the respective tumor cells during metastasic processes such as intra- and 
extravasation. Strikingly, the genes within this region, i.e. ZNF144, TEM7, PIP5K, PPP1R1B and 
CACNB1, all do have physiological functions within the central nervous system. Therefore, we do 
assume, that a „neuronal environment" would be favourable for tumor cells overexpressing these 
genes resulting in growth and survival advantages for these particular tumor cells. In accordance 

30 with this, it is observed that Herceptin resistant metastasis frequently occur in the brain. So far it 
has been discussed, that this observation refers to toxicological problems such as drug- 
biovailability with respect to the blood brain barrier. It is part of this invention, that genes which 
are normally expressed within neuronal cells are integral part of the centromeric gene cluster of the 
ARCHEON on chromosome 17q21 and are involved in de novo and acquired resistance to 

35 Herceptin based treatment Independent amplification units and/or deletion of singular genes of 
this centromeric cluster due to chromosomal breakage interferes with the survival and resistance 
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function of this genomic region. Therefore the continuity of amplification units is another 
important feature with regard to responsivenessor unresponsiveness to therapy. It is noteworthy to 
mention, that not only the presence of particular genes, but also the presence of regulatory 
elements within this genomic region contribute to the above mentioned biological phenotype. 
Therefore also the loss or gain of regulatory elements within the centromeric part of the 
ARCHEON is of importance for resistance to anti cancer treatment and therefore part of this 
invention. 

In addition to the alteration of centromeric ARCHEON region, the total length of the ARCHEON 
with regard to the telomeric region and the relative gene copy numbers of the amplified genes are 
of importance. Particularly the integrity of the genomic region harboring the TOP2alpha gene with 
the surrounding genes THRA, NR1D1, MLN51, WIRE, HsCDC6, RARA, CTEN, IGFBP4, EBI1 
and SMARCE1 is of interest Her-2/neu positive tumors, that are deleted in at least some of this 
genes exhibit a worsened response to herceptin-based chemotherapy. This demonstrates, that this 
region is not only of prognostic value for anthracyclin-based therapy, but also of prognostic value 
for chemotherapeutic treatment with taxol-related agents and platin salts. The amplification, 
deletion or silencing of this telomeric region is accompanied with altered sensitivity to the above 
mentioned chemotherapeutics. This is a general feature of tumors bearing alterations (with regard 
to gene expression and/or amplification of the 17q21 ARCHEON) and not only true for breast 
cancer. In line with this, we have analyzed ovarian tumors bearing alterations in the 17q21 
ARCHEON and correlated the clinical outcome, that was assessed similarly as depicted above, 
with regard to a platin salt based therapy. Strikingly, tumors with defined genetic patterns within 
this telomeric regions did develop resistance to this chemotherapeutic regimen. Detecting solely 
the coamplification of Her-2/neu and TOP2alpha was not as informative with regard to response 
prediction as a detailed characterization and subsequent response correlation with the region of the 
THRA, NR1D1, MLN51, WIRE, HsCDC6 and RARA genes. It is part of this invention, that the 
proliferation status of tumors is affected by genes within ARCHEON regions. The 17q21 
ARCHEON determines to at least some extent the proliferation rate of tumor cells. Interestingly, 
Her-2/neu positive tumors bearing elevated levels of a more limited number of genes, excluding 
several genes in the telomeric region (i.e. TOP2alpha, HsCDC6) exhibit a relatively slow growth 
rate, which diminishes the effect of chemotherapeutic drugs targeting proliferating cells and is one 
of the reasons for the resistance of these tumors to said agents. Instead, these tumors have a higher 
capacity with regard to invasiveness and do have a diminished apoptotic rate, which to some extent 
refers to the signaling of Her-2/neu via GRB7 and AKT kinase (also" affected by inositols and 
calcium, see above), respectively. Several genes within the telomeric region of the ARCHEONs 
affect Her-2/neu signalling, such as RARA, THRA, IGFBP4, and alter the respective 
characteristics of the cells including proliferation status. 
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The ARCHEONs being part of this invention, are not only important for clinical response of 
tumors to antibody-based therapies raised against EGFR- and Her-2/neu signaling (e.g. Herceptin, 
2C4 or cetuximab regimen) and to chemotherapeutic agents, but also are of importance for diverse 
strategies of anti hormonal treatment (e.g. Tamoxifen, Raloxifen, anastrozol, letrozol, faslodex). In 
5 particular, elevated levels of the PPARBP gene and protein and the integrity of the telomeric 
hormone receptor region of the 17q21 ARCHEON, bearing THRA, NR1D1 and RARA, or its 
related regions on the other ARCHEONs are of importance for these therapeutic regimens. In a 
retrospective, clinical study evaluating the above mentioned clinical parameters for adjuvant 
treatment of breast cancer with tamoxifen, we did observe, that the overexpression of PPARBP 

10 has impact on the overall survival of patients receiving this therapy. Overexpression of PPARBP 
enables activity of estrogen and progesteron receptors irrespective of a bound ligand. Therefore, 
the deregulation of the PPARBP results in the activity of these hormone receptors in the absence of 
the hormones and even in the presence of anti-hormones and thereby circumvents the anti tumor 
effect of anti hormonal strategies resulting in resistance of PPARBP overexpressing cells. In 

15 addition overexpression of hormone receptors other than estrogen receptor in tumor cells affects 
activity of estrogen or the respective anti-hormones by competition for dimerization partners, such 
as RXR, or transcriptional activator or repressor genes, such as CBP or NCOR. With regard to 
tamoxifen treatment this clearly diminishes the effect of the anti-hormone, as the pool of the 
transcriptional cofactors is reduced for the classical mode of action of tamoxifen within the 

20 nucleus.. 

The invention further relates to the use of: 

A) a polynucleotide comprising at least one of the sequences of SEQ ID NO: 1 to 26 or 53 to 
75; 

B) a polynucleotide which hybridizes under stringent conditions to a polynucleotide specified 
25 in (a) encoding a polypeptide exhibiting the same biological function as specified for the 

respective sequence in Table 2 or 3 

C) a polynucleotide the sequence of which deviates from the polynucleotide specified in (a) 
and (b) due to the generation of the genetic code encoding a polypeptide exhibiting the 
same biological function as specified for the respective sequence in Table 2 or 3 



30 D) 



a polynucleotide which represents a specific fragment, derivative or allelic variation of a 
polynucleotide sequence specified in (a) to (c) 
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E) an antisense molecule targeting specifically one of the polynucleotide sequences specified 
in (a) to (d); 

F) a purified polypeptide encoded by a polynucleotide sequence specified in (a) to (d) 

G) a purified polypeptide comprising at least one of the sequences of SEQ ID NO: 27 to 52 or 
5 76 to 98; 

H) an antibody capable of binding to one of the polynucleotide specified in (a) to (d) or a 
polypeptide specified in (f) and (g) 

I) a reagent identified by any of the methods of claim 14 to 16 that modulates the amount or 
activity of a polynucleotide sequence specified in (a) to (d) or a polypeptide specified in (f) 

10 and (g) 

In the preparation of a composition for the prevention, prediction, diagnosis, prognosis or a 
medicament for the treatment of malignant neoplasia and breast cancer in particular. 

Polynucleotides 

A ,3REAST CANCER GENE" polynucleotide can be single- or double-stranded and comprises a 
15 coding sequence or the complement of a coding sequence for a „BREAST CANCER GENE" 
polypeptide. Degenerate nucleotide sequences encoding human „BREAST CANCER GENE" 
polypeptides, as well as homologous nucleotide sequences which are at least about 50, 55, 60, 65, 
70, preferably about 75, 90, 96, or 98% identical to the nucleotide sequences of SEQ ID NO: 1 to 
26or 53 to 75 also are „BREAST CANCER GENE" polynucleotides. Percent sequence identity 
20 between the sequences of two polynucleotides is determined using computer programs such as 
ALIGN which employ the FASTA algorithm, using an affine gap search with a gap open penalty 
of -12 and a gap extension penalty of -2. Complementary DNA (cDNA) molecules, species 
homologues, and variants of „BREAST CANCER GENE" polynucleotides which encode 
biologically active „BREAST CANCER GENE" polypeptides also are „BREAST CANCER 
25 GENE" polynucleotides. 

Preparation of Polynucleotides 

A naturally occurring „BREAST CANCER GENE" polynucleotide can be isolated free of other 
cellular components such as membrane components, proteins, and lipids. Polynucleotides can be 
made by a cell and isolated using standard nucleic acid purification techniques, or synthesized 
30 using an amplification technique, such as the polymerase chain reaction (PCR), or by using an 
automatic synthesizer. Methods for isolating polynucleotides are routine and are known in the art. 
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Any such technique for obtaining a polynucleotide can be used to obtain isolated „BREAST 
CANCER GENE" polynucleotides. For example, restriction enzymes and probes can be used to 
isolate polynucleotide fragments which comprises ,3REAST CANCER GENE" nucleotide 
sequences. Isolated polynucleotides are in preparations which are free or at least 70, 80, or 90% 
5 free of other molecules. 

„BREAST CANCER GENE" cDNA molecules can be made with standard molecular biology 
techniques, using „BREAST CANCER GENE" mRNA as a template. Any RNA isolation 
technique which does not select against the isolation of mRNA may be utilized for the purification 
of such RNA samples. See, for example, Sambrook et al., 1989, (77); and Ausubel, F. M. et al., 
10 1989, (78), both of which are incorporated herein by reference in their entirety. Additionally, large 
numbers of tissue samples may readily be processed using techniques well known to those of skill 
in the art, such as, for example, the single-step RNA isolation process of Chomczynski, P. (1989, 
U.S. Pat. No. 4,843,155), which is incorporated herein by reference in its entirety. 

„BREAST CANCER GENE" cDNA molecules can thereafter be replicated using molecular 
1 5 biology techniques known in the art and disclosed in manuals such as Sambrook et al., 1989, (77) . 
An amplification technique, such as PCR, can be used to obtain additional copies of 
polynucleotides of the invention, using either human genomic DNA or cDNA as a template. 

Alternatively, synthetic chemistry techniques can be used to synthesizes „BREAST CANCER 
GENE" polynucleotides. The degeneracy of the genetic code allows alternate nucleotide sequences 
20 to be synthesized which will encode a „BREAST CANCER GENE" polypeptide or a biologically 
active variant thereof. 

Identification of differential expression 

Transcripts within the collected RNA samples which represent RNA produced by differentially 
expressed genes may be identified by utilizing a variety of methods which are ell known to those 
25 of skill in the art. For example, differential screening [Tedder, T. F. et al., 1988, (79)], subtractive 
hybridization [Hedrick, S. M. et al., 1984, (80); Lee, S. W. et al., 1984, (81)], and, preferably, 
differential display (Liang, P., and Pardee, A. B., 1993, U.S. Pat. No. 5,262,311, which is 
incorporated herein by reference in its entirety), may be utilized to identify polynucleotide 
sequences derived from genes that are differentially expressed. 

30 Differential screening involves the duplicate screening of a cDNA library in which one copy of the 
library is screened with a total cell cDNA probe corresponding to the mRNA population of one 
cell type while a duplicate copy of the cDNA library is screened with a total cDNA probe 
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corresponding to the mRNA population of a second cell type. For example, one cDNA probe may 
correspond to a total cell cDNA probe of a cell type derived from a control subject, while the 
second cDNA probe may correspond to a total cell cDNA probe of the same cell type derived from 
an experimental subject Those clones which hybridize to one probe but not to the other potentially 
5 represent clones derived from genes differentially expressed in the cell type of interest in control 
versus experimental subjects. 

Subtractive hybridization techniques generally involve the isolation of mKNA taken from two 
different sources, e.g., control and experimental tissue, the hybridization of the mRNA or single- 
stranded cDNA reverse-transcribed from the isolated mRNA, and the removal of all hybridized, 
10 and therefore double-stranded, sequences. The remaining non-hybridized, single-stranded cDNAs, 
potentially represent clones derived from genes that are differentially expressed in the two mRNA 
sources. Such single-stranded cDNAs are then used as the starting material for the construction of 
a library comprising clones derived from differentially expressed genes. 

The differential display technique describes a procedure, utilizing the well known polymerase 
15 chain reaction (PCR; the experimental embodiment set forth in Mullis, K. B., 1987, U.S. Pat. No. 
4,683,202) which allows for the identification of sequences derived from genes which are 
differentially expressed. First, isolated RNA is reverse-transcribed into single-stranded cDNA, 
utilizing standard techniques which are well known to those of skill in the art. Primers for the 
reverse transcriptase reaction may include, but are not limited to, oligo dT-containing primers, 
20 preferably of the reverse primer type of oligonucleotide described below. Next, this technique uses 
pairs of PCR primers, as described below, which allow for the amplification of clones representing 
a random subset of the RNA transcripts present within any given cell. Utilizing different pairs of 
primers allows each of the mRNA transcripts present in a cell to be amplified. Among such 
amplified transcripts may be identified those which have been produced from differentially 
25 expressed genes. 

The reverse oligonucleotide primer of the primer pairs may contain an oligo dT stretch of 
nucleotides, preferably eleven nucleotides long, at its 5' end, which hybridizes to the poly(A) tail 
of mRNA or to the complement of a cDNA reverse transcribed from an mRNA poly(A) tail. 
Second, in order to increase the specificity of the reverse primer, the primer may contain one or 
30 more, preferably two, additional nucleotides at its 3" end. Because, statistically, only a subset of the 
mRNA derived sequences present in the sample of interest will hybridize to such primers, the 
additional nucleotides allow the primers to amplify only a subset of the mRNA derived sequences 
present in the sample of interest. This is preferred in that it allows more accurate and complete 
visualization and characterization of each of the bands representing amplified sequences. 
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The forward primer may contain a nucleotide sequence expected, statistically, to have the ability to 
hybridize to cDNA sequences derived from the tissues of interest. The nucleotide sequence may be 
an arbitrary one, and the length of the forward oligonucleotide primer may range from about 9 to 
about 13 nucleotides, with about 10 nucleotides being preferred. Arbitrary primer sequences cause 
5 the lengths of fee amplified partial cDNAs produced to be variable, thus allowing different clones 
to be separated by using standard denaturing sequencing gel electrophoresis. PCR reaction 
conditions should be chosen which optimize amplified product yield and specificity, and, 
additionally, produce amplified products of lengths which may be resolved utilizing standard gel 
electrophoresis techniques. Such reaction conditions are well known to those of skill in the art, and 

10 important reaction parameters include, for example, length and nucleotide sequence of 
oligonucleotide primers as discussed above, and annealing and elongation step temperatures and 
reaction times. The pattern of clones resulting from the reverse transcription and amplification of 
the mRNA of two different cell types is displayed via sequencing gel electrophoresis and 
compared. Differences in fee two banding patterns indicate potentially differentially expressed 

15 genes. 

When screening for full-length cDNAs, it is preferable to use libraries that have been size-selected 
to include larger cDNAs. Randomly-primed libraries are preferable, in that they will contain more 
sequences which contain the 5' regions of genes. Use of a randomly primed library may be 
especially preferable for situations in which an oligo d(T) library does not yield a full-length 
20 cDNA. Genomic libraries can be useful for extension of sequence into 5' nontranscribed regulatory 
regions. 

Commercially available capillary electrophoresis systems can be used to analyze the size or 
confirm the nucleotide sequence of PCR or sequencing products. For example, capillary 
sequencing can employ flowable polymers for electrophoretic separation, four different fluorescent 

25 dyes (one for each nucleotide) which are laser activated, and detection of the emitted wavelengths 
by a charge coupled device camera. Output/light intensity can be converted to electrical signal 
using appropriate software (e.g. GENOTYPER and Sequence NAVIGATOR, Perkin Elmer; ABI), 
and the entire process from loading of samples to computer analysis and electronic data display 
can be computer controlled. Capillary electrophoresis is especially preferable for the sequencing 

30 of small pieces of DNA which might be present in limited amounts in a particular sample. 

Once potentially differentially expressed gene sequences have been identified via bulk techniques 
such as, for example, those described above, the differential expression of such putatively 
differentially expressed genes should be corroborated. Corroboration may be accomplished via, for 
example, such well known techniques as Northern analysis and/or RT-PCR. Upon corroboration, 
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the differentially expressed genes may be further characterized, and may be identified as target 
and/or marker genes, as discussed, below. 

Also, amplified sequences of differentially expressed genes obtained through, for example, 
differential display may be used to isolate full length clones of the corresponding gene. The full 
5 length coding portion of the gene may readily be isolated, without undue experimentation, by 
molecular biological techniques well known in the art For example, the isolated differentially 
expressed amplified fragment may be labeled and used to screen a cDNA library. Alternatively, the 
labeled fragment may be used to screen a genomic library. 

An analysis of the tissue distribution of the mRNA produced by the identified genes may be 
10 conducted, utilizing standard techniques well known to those of skill in the art. Such techniques 
may include, for example, Northern analyses and RT-PCR. Such analyses provide information as 
to whether the identified genes are expressed in tissues expected to contribute to breast cancer. 
Such analyses may also provide quantitative information regarding steady state mRNA regulation, 
yielding data concerning which of the identified genes exhibits a high level of regulation in, 
1 5 preferably, tissues which may be expected to contribute to breast cancer. 

Such analyses may also be performed on an isolated cell population of a particular cell type 
derived from a given tissue. Additionally, standard in situ hybridization techniques may be utilized 
to provide information regarding which cells within a given tissue express the identified gene. 
Such analyses may provide information regarding the biological function of an identified gene 
20 relative to breast cancer in instances wherein only a subset of the cells within the tissue is thought 
to be relevant to breast cancer. 

Identification of co-amplified penes 

Genes involved in genomic alterations (amplifications, insertions, translocations, deletions, etc.) 
are identified by PCR-based karyotyping in combination with database analysis. Of particular 

25 interest are gene amplifications, which account for gene copy numbers >2 per cell. Gene copy 
number and gene expression of the respective genes often correlates. Therefore clusters of genes 
being simultaneously overexpressed due to gene amplifications can be identified by expression 
analysis via DNA-chip technologies or quantitative RTPCR. For example, the altered expression 
of genes due to increased or decreased gene copy numbers can be determined by GeneArray™ 

30 technologies from Afiymetrix or qRT-PCR with the TaqMan or iCycler Systems. Moreover 
combination of RNA with DNA analytic enables highly parallel and automated characterization of 
multiple genomic regions of variable length with high resolution in tissue or single cell samples. 
Furthermore these assays enable the correlation of gene transcription relative to gene copy number 
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of target genes. As there is not necessarily a linear correlation of expression level and gene copy 
number and as there are synergistic or antagonistic effects in certain gene clusters, the 
identification on the KNA-level is easier and probably more relevant for the biological outcome of 
the alterations especially in tumor tissue. 

5 Detection of co-amplified zenes in malignant neoplasia 

Chromosomal changes are commonly detected by FISH (=Fluorescence-In-Situ-Hybridization) and 
CGH (Comparative Genomic Hybridization). For quantification of genomic regions genes or 
intergenic regions can be used. Such quantification measures the relative abundance of multiple 
genes with respect to each other (e.g. target gene vs. centromeric region or housekeeping genes). 
1 0 Changes in relative abundance can be detected in paraffin-embedded material even after extraction 
of RNA or genomic DNA. Measurement of genomic DNA has advantages compared to RNA- 
analysis due to the stability of DNA, which accounts for the possibility to perform also 
retrospective studies and offers multiple internal controls (genes not being altered, amplified or 
deleted) for standardization and exact calculations. Moreover, PCR-analysis of genomic DNA 

15 offers the advantage to investigate intergenic, highly variable regions or combinations of SNPs 
(=Single Nucleotide Polymorphisms), RFLPs, VNTRs and STRs (in general polypmorphic 
markers). Determination of SNPs or polypmorphic markers within defined genomic regions (e.g. 
SNP analysis by "Pyrosequencing™") has impact on the phenotype of the genomic alterations. For 
example it is of advantage to determine combinations of polymorphisms or haplotypes in order to 

20 characterize the biological potential of genes being part of amplified alleles. Of particular interest 
are polypmorphic markers in breakpoint regions, coding regions or regulatory regions of genes or 
intergenic regions. By determining predictive haplotypes with defined biological or clinical 
outcome it is possible to establish diagnostic and prognostic assays with non-tumor samples from 
patients. Depending on whether preferably one allele or both alleles to same extent are amplified 

25 (= linear or non-linear amplifications) haplotypes can be determined. Overrepresentation of 
specific polypmorphic markers combinations in cells or tissues with gene amplifications facilitates 
haplotype determination, as e.g. combinations of heterozygous polypmorphic markers in nucleic 
acids isolated from normal tissues, body fluids or biological samples of one patient become almost 
homo2ygous in neoplastic tissue of the very same patient This M gain of homozygosity" 

30 corresponds to the measurement of altered genomic region due to amplification events and is 
suitable for identification of "gain of function"- alterations in tumors, which result in e.g. 
oncogenic or growth promoting activities. In contrast, the detection of "losses of heterozygosity" is 
used for identification of anti-oncogenes, gate keeper genes or checkpoint genes, that suppress 
oncogenic activities and negatively regulate cellular growth processes. This intrinsic difference 

35 clearly opposes the impact of the respective genomic regions for tumor development and 
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emphasizes the significance of "gain of homozygosity" measurements disclosed in this invention. 
In addition to the analyses on SNPs, a comparative approach of blood leucocyte DNA and tumor 
DNA based on VNTR detection can reveal the existance of a formerely described ARCHEON. 
SNP and VNTR sequences and primer sets most suitable for detection of theARCHEON at 17ql 1- 
21 are disclosed in Table 4 and Table 6. Detection, quantification and sizing of such polymorphic 
markers can be achieved by methods known to those with skill in the art. In one embodiment of 
this invention we disclose the comparative measurement of amount and size of any of the disclosed 
VNTRs (Table 6) by PCR amplification and capillary electrophoresis. PCR can be canied out by 
standart protocols favorably in a linear amplification range (low cycle number) and detection by 
CE should be carried out by suppliers protocols (e.g. Agilent). More favorably the detection of the 
VNTRs disclosed in Table 6 can be carried out in a multiplex fashion, utilizing a variety of labeled 
primers (e.g. fluoreszent, radioactive, bioactive) and a suitable CE detection system (e.g. ABI 310). 
However the detection can also be performed on slab gels consiting of highly concentrated agarose 
or polyacrylamide with a monochromal DNA stain. Enhancement of resolution can be achieved by 
appropriate primer design and length variation to give best results in multiplex PCR. 

It is also of interest to determine covalent modifications of DNA (e.g. methylation) or the 
associated chromatin (e.g. acetylation or methylation of associated proteins) within the altered 
genomic regions, that have impact on transcriptional activity of the genes. In general, by measuring 
multiple, short sequences (60-300 bp) these techniques enable high-resolution analysis of target 
regions, which cannot be obtained by conventional methods such as FISH analytic (2-100 kb). 
Moreover the PCR-based DNA analysis techniques offer advantages with regard to sensitivity, 
specificity, multiplexing, time consumption and low amount of patient material required. These 
techniques can be optimized by combination with microdissection or macrodissection to obtain 
purer starting material for analysis. 

Extending Polynucleotides 

hi one embodiment of such a procedure for the identification and cloning of full length gene 
sequences, RNA may be isolated, following standard procedures, from an appropriate tissue or 
cellular source. A reverse transcription reaction may then be performed on the RNA using an 
oligonucleotide primer complimentary to the mRNA that corresponds to the amplified fragment, 
for the priming of first strand synthesis. Because the primer is anti-parallel to the mRNA, 
extension will proceed toward the 5 ! end of the mRNA. The resulting RNA hybrid may then be 
"tailed" with guanines using a standard terminal transferase reaction, the hybrid may be digested 
with RNase H, and second strand synthesis may then be primed with a poly-C primer. Using the 
two primers, the 5' portion of the gene is amplified using PCR. Sequences obtained may then be 
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isolated and recombined with previously isolated sequences to generate a full-length cDNA of the 
differentially expressed genes of the invention. For a review of cloning strategies and recombinant 
DNA techniques, see e.g., Sambrook et al., (77); and Ausubel et al., (78). 

Various PCR-based methods can be used to extend the polynucleotide sequences disclosed herein 
5 to detect upstream sequences such as promoters and regulatory elements. For example, restriction 
site PCR uses universal primers to retrieve unknown sequence adjacent to a known locus [Sarkar, 
1993, (82)]. Genomic DNA is first amplified in the presence of a primer to a linker sequence and a 
primer specific to the known region. The amplified sequences are then subjected to a second round 
of PCR with the same linker primer and another specific primer internal to the first one. Products 
10 of each round of PCR are transcribed with an appropriate RNA polymerase and sequenced using 
reverse transcriptase. 

. Inverse PCR also can be used to amplify or extend sequences using divergent primers based on a 
known region [Triglia et ah, 1988 ,(83)]. Primers can be designed using commercially available 
software, such as OLIGO 4.06 Primer Analysis software (National Biosciences Inc., Plymouth, 
15 Minn.), to be e.g. 2230 nucleotides in length, to have a GC content of 50% or more, and to anneal 
to the target sequence at temperatures about 68-72°C. The method uses several restriction enzymes 
to generate a suitable fragment in the known region of a gene. The fragment is then circularized by 
intramolecular ligation and used as a PCR template. 

Another method which can be used is capture PCR, which involves PCR amplification of DNA 
20 fragments adjacent to a known sequence in human and yeast artificial chromosome DNA 
[Lagerstrom et al., 1991, (84)]. In this method, multiple restriction enzyme digestions and ligations 
also can be used to place an engineered double-stranded sequence into an unknown fragment of the 
DNA molecule before performing PCR. 

Additionally, PCR, nested primers, and PROMOTERF3NDER libraries (CLONTECH, Palo Alto, 
25 Calif.) can be used to walk genomic DNA (CLONTECH, Palo Alto, Calif.). This process avoids 
the need to screen libraries and is useful in finding intron/exon junctions. 

The sequences of the identified genes may be used, utilizing standard techniques, to place the 
genes onto genetic maps, e.g., mouse [Copeland & Jenkins, 1991, (85)] and human genetic maps 
[Cohen, et al., 1993 ,(86)]. Such mapping information may yield information regarding the genes' 
30 importance to human disease by, for example, identifying genes which map near genetic regions to 
which known genetic breast cancer tendencies map. 
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Identification of polynucleotide variants and homologues or splice variants 

Variants and homologues of the ,3REAST CANCER GENE" polynucleotides described above 
also are ,3REAST CANCER GENE" polynucleotides. Typically, homologous ,3REAST 
CANCER GENE" polynucleotide sequences can be identified by hybridization of candidate 
5 polynucleotides to known ,3REAST CANCER GENE" polynucleotides under stringent 
conditions, as is known in the art. For example, using the following wash conditions: 2 X SSC 
(0.3 M NaCl, 0.03 M sodium citrate, pH 7.0), 0.1% SDS, room temperature twice, 30 minutes 
each; then 2 X SSC, 0.1% SDS, 50 EC once, 30 minutes; then 2 X SSC, room temperature twice, 
10 minutes each homologous sequences can be identified which contain at most about 25-30% 
10 basepair mismatches. More preferably, homologous polynucleotide strands contain 15-25% 
basepair mismatches, even more preferably 5-15% basepair mismatches. 

Species homologues of the ,3REAST CANCER GENE" polynucleotides disclosed herein also can 
be identified by making suitable probes or primers and screening cDNA expression libraries from 
other species, such as mice, monkeys, or yeast. Human variants of ,3REAST CANCER GENE" 

15 polynucleotides can be identified, for example, by screening human cDNA expression libraries. It 
is well known that the T m of a double-stranded DNA decreases by 1-1.5°C with every 1% decrease 
in homology [Bonner et al., 1973, (87)]. Variants of human ,3REAST CANCER GENE" 
polynucleotides or ,3REAST CANCER GENE" polynucleotides of other species can therefore be 
identified by hybridizing a putative homologous ,3REAST CANCER GENE" polynucleotide with 

20 a polynucleotide having a nucleotide sequence of one of the sequences of the SEQ ID NO: 1 to 26 
or 53 to 75 or the complement thereof to form a test hybrid. The melting temperature of the test 
hybrid is compared with the melting temperature of a hybrid comprising polynucleotides having 
perfectly complementary nucleotide sequences, and the number or percent of basepair mismatches 
within the test hybrid is calculated. 

25 Nucleotide sequences which hybridize to ,3REAST CANCER GENE" polynucleotides or their 
complements following stringent hybridization and/or wash conditions also are ,3REAST 
CANCER GENE" polynucleotides. Stringent wash conditions are well known and understood in 
the art and are disclosed, for example, in Sambrook et al., (77). Typically, for stringent 
hybridization conditions a combination of temperature and salt concentration should be chosen 

30 that is approximately 12-20°C below the calculated T m of the hybrid under study. The T m of a 
hybrid between a ,3REAST CANCER GENE" polynucleotide having a nucleotide sequence of 
one of the sequences of the SEQ ID NO: 1 to 26 or 53 to 75 or the complement thereof and a 
polynucleotide sequence which is at least about 50, preferably about 75, 90, 96, or 98% identical 
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to one of those nucleotide sequences can be calculated, for example, using the equation below 
[Bolton and McCarthy, 1962, (88): 

T m = 81 .5°C - l^ClogutNa 4 ]) + 0.41(%G + O - 0.63(%formamide) - 600/1), 

where 1 = the length of the hybrid in basepairs. 

Stringent wash conditions include, for example, 4 X SSC at 65°C, or 50% formamide, 4 X SSC at 
28°C, or 0.5 X SSC, 0.1% SDS at 65°C. Highly stringent wash conditions include, for example, 
0.2XSSCat65°C. 

The biological function of the identified genes may be more directly assessed by utilizing relevant 
in vivo and in vitro systems. In vivo systems may include, but are not limited to, animal systems 
which naturally exhibit breast cancer predisposition, or ones which have been engineered to 
exhibit such symptoms, including but not limited to the apoE-deficient malignant neoplasia mouse 
model [Plump et aL, 1992, (89)]. 

Splice variants derived from the same genomic region, encoded by the same pre mRNA can be 
identified by hybridization conditions described above for homology search. The specific 
characteristics of variant proteins encoded by splice variants of the same pre transcript may differ 
and can also be assayed as disclosed. A „BREAST CANCER GENE" polynucleotide having a 
nucleotide sequence of one of the sequences of the SEQ ID NO: 1 to 26 or 53 to 75 or the 
complement thereof may therefor differ in parts of the entire sequence as presented for SEQ ID 
NO: 60 and the encoded splice variants SEQ ID NO: 61 to 66. These refer to individual proteins 
SEQ ID NO: 83 to 89. The prediction of splicing events and the identification of the utilized 
acceptor and donor sites within the pre mRNA can be computed (e.g. Software Package GRAIL or 
GenomeSCAN) and verified by PCR method by those with skill in the art. 

Antisense oligonucleotides 

Antisense oligonucleotides are nucleotide sequences which are complementary to a specific DNA 
or RNA sequence. Once introduced into a cell, the complementary nucleotides combine with 
natural sequences produced by the cell to form complexes and block either transcription or 
translation. Preferably, an antisense oligonucleotide is at least 6 nucleotides in length, but can be at 
least 7, 8, 10, 12, 15, 20, 25, 30, 35, 40, 45, or 50 or more nucleotides long. Longer sequences also 
can be used. Antisense oligonucleotide molecules can be provided in a DNA construct and 
introduced into a cell as described above to decrease the level of „BREAST CANCER GENE" 
gene products in the cell. 
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Antisense oligonucleotides can be deoxyribonucleotides, ribonucleotides, peptide nucleic acids 
(PNAs; described in U.S. Pat. No. 5,714,331), locked nucleic acids (LNAs; described in WO 
99/12826), or a combination of them. Oligonucleotides can be synthesized manually or by an 
automated synthesizer, by covalently linking the 5* end of one nucleotide with the 3 1 end of another 
5 nucleotide with non-phosphodiester internucleotide linkages such alkylphosphonates, 
phosphorothioates, phosphorodithioates, alkylphosphonothioates, alkylphosphonates, 
phosphoramidates, phosphate esters, carbamates, acetamidate, carboxymethyl esters, carbonates, 
and phosphate triesters[Brown, 1994, (126); Sonveaux, 1994, (127) and Uhlmann et al., 1990, 
(128)]. 

10 Modifications of ,3REAST CANCER GENE" expression can be obtained by designing antisense 
oligonucleotides which will form duplexes to the control, 5 1 , or regulatory regions of the 
,3REAST CANCER GENE". Oligonucleotides derived from the transcription initiation site, e.g., 
between positions 10 and +10 from the start site, are prefenred. Similarly, inhibition can be 
achieved using "triple helix" base-pairing methodology. Triple helix pairing is useful because it 

15 causes inhibition of the ability of the double helix to open sufficiently for the binding of 
polymerases, transcription factors, or chaperons. Therapeutic advances using triplex DNA have 
been described in the literature [Gee et al., 1994, (129)]. An antisense oligonucleotide also can be 
designed to block translation of iriRNA by preventing the transcript from binding to ribosomes. 

Precise complementarity is not required for successful complex formation between an antisense 
20 oligonucleotide and the complementary sequence of a ,3REAST CANCER GENE" 
polynucleotide. Antisense oligonucleotides which comprise, for example, 2, 3, 4, or 5 or more 
stretches of contiguous nucleotides which are precisely complementary to a ,3REAST CANCER 
GENE" polynucleotide, each separated by a stretch of contiguous nucleotides which are not 
complementary to adjacent „BREAST CANCER GENE" nucleotides, can provide sufficient 
25 targeting specificity for „BREAST CANCER GENE" mRNA. Preferably, each stretch of 
complementary contiguous nucleotides is at least 4, 5, 6, 7, or 8 or more nucleotides in length. 
Non-complementary intervening sequences are preferably 1, 2, 3, or 4 nucleotides in length. One 
skilled in the art can easily use the calculated melting point of an antisense-sense pair to determine 
the degree of mismatching which will be tolerated between a particular antisense oligonucleotide 
30 and a particular ,3REAST CANCER GENE" polynucleotide sequence. 

Antisense oligonucleotides can be modified without affecting their ability to hybridize to a 
,3REAST CANCER GENE" polynucleotide. These modifications can be internal or at one or 
both ends of the antisense molecule. For example, internucleoside phosphate linkages can be 
modified by adding cholesteryl or diamine moieties with varying numbers of carbon residues 
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between the amino groups and terminal ribose. Modified bases and/or sugars, such as arabinose 
instead of ribose, or a 3', 5' substituted oligonucleotide in which the 3' hydroxyl group or the 5 f 
phosphate group are substituted, also can be employed in a modified antisense oligonucleotide. 
These modified oligonucleotides can be prepared by methods well known in the art[ Agrawal et 
5 aL, 1992, (130); Uhlmann et al., 1987, (131) and Uhlmann et al., (128)]. 

Ribozvmes 

Ribozymes are RNA molecules with catalytic activity [Cech, 1987, (132); Cech, 1990, (133) and 
Couture & Stinchcomb, 1996, (134)]. Ribozymes can be used to inhibit gene function by cleaving 
an RNA sequence, as is known in the art (e.g., Haseloff et al., U.S. Patent 5,641,673). The 
10 mechanism of ribozyme action involves sequence-specific hybridization of the ribozyme molecule 
to complementary target RNA, followed by endonucleolytic cleavage. Examples include 
engineered hammerhead motif ribozyme molecules that can specifically and efficiently catalyze 
endonucleolytic cleavage of specific nucleotide sequences. 

The transcribed sequence of a ,JBREAST CANCER GENE" can be used to generate ribozymes 
15 which will specifically bind to mRNA transcribed from a „BREAST CANCER GENE" genomic 
locus. Methods of designing and constructing ribozymes which can cleave other RNA molecules in 
trans in a highly sequence specific maimer have been developed and described in the art [Haseloff 
et al., 1988, (135)]. For example, the cleavage activity of ribozymes can be targeted to specific 
RNAs by engineering a discrete "hybridization" region into the ribozyme. The hybridization region 
20 contains a sequence complementary to the target RNA and thus specifically hybridizes with the 
target [see, for example, Gerlach et al., EP 0 321201]. 

Specific ribozyme cleavage sites within a ,JBREAST CANCER GENE" RNA target can be 
identified by scanning the target molecule for ribozyme cleavage sites which include the following 
sequences: GUA, GUU, and GUC, Once identified, short RNA sequences of between 15 and 20 

25 ribonucleotides corresponding to the region of the target RNA containing the cleavage site can be 
evaluated for secondary structural features which may render the target inoperable. Suitability of 
candidate „BREAST CANCER GENE" RNA targets also can be evaluated by testing accessibility 
to hybridization with complementary oligonucleotides using ribonuclease protection assays. 
Longer complementary sequences can be used to increase the affinity of the hybridization 

30 sequence for the target. The hybridizing and cleavage regions of the ribozyme can be integrally 
related such that upon hybridizing to the target RNA through the complementary regions, the 
catalytic region of the ribozyme can cleave the target 
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Ribozymes can be introduced into cells as part of a DNA construct Mechanical methods, such as 
microinjection, liposome-mediated transfection, electroporation, or calcium phosphate 
precipitation, can be used to introduce a ribozyme-containing DNA construct into cells in which it 
is desired to decrease „BREAST CANCER GENE" expression. Alternatively, if it is desired that 
the cells stably retain the DNA construct, the construct can be supplied on a plasmid and 
maintained as a separate element or integrated into the genome of the cells, as is known in the art. 
A ribozyme-encoding DNA construct can include transcriptional regulatory elements, such as a 
promoter element, an enhancer or UAS element, and a transcriptional terminator signal, for 
controlling transcription of ribozymes in the cells. 

As taught in Haseloff et al., U.S Pat. No. 5,641,673, ribozymes can be engineered so that ribozyme 
expression will occur in response to factors which induce expression of a target gene. Ribozymes 
also can be engineered to provide an additional level of regulation, so that destruction of mRNA 
occurs only when both a ribozyme and a target gene are induced in the cells. 

Polypeptides 

"BREAST CANCER GENE" polypeptides according to the invention comprise an polypeptide 
selected from SEQ ID NO: 27 to 52 and 76 to 98 or encoded by any of the polynucleotide 
sequences of the SEQ ID NO: 1 to 26 and 53 to 75 or derivatives, fragments, analogues and 
homologues thereof. A "BREAST CANCER GENE" polypeptide of the invention therefore can be 
a portion, a full-length, or a fusion protein comprising all or a portion of a "BREAST CANCER 
GENE" polypeptide. 

Protein Purification 

,3REAST CANCER GENE" polypeptides can be purified from any cell which expresses the 
enzyme, including host cells which have been transfected with ,3REAST CANCER GENE" 
expression constructs. Breast tissue is an especially useful source of „BREAST CANCER GENE" 
polypeptides. A purified JBREAST CANCER GENE" polypeptide is separated from other 
compounds which noimally associate with the ,3REAST CANCER GENE" polypeptide in the 
cell, such as certain proteins, carbohydrates, or lipids, using methods well-known in the art Such 
methods include, but are not limited to, size exclusion chromatography, ammonium sulfate 
fractionation, ion exchange chromatography, affinity chromatography, and preparative gel 
electrophoresis. A preparation of purified ,3REAST CANCER GENE" polypeptides is at least 
80% pure; preferably, the preparations are 90%, 95%, or 99% pure. Purity of the preparations can 
be assessed by any means known in the art, such as SDS-polyacrylamide gel electrophoresis. 
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Obtaining Polypeytides 

„BREAST CANCER GENE" polypeptides can be obtained, for example, by purification from 
human cells, by expression of „BREAST CANCER GENE" polynucleotides, or by direct chemical 
synthesis. 

5 Bioloeicallv Active Variants 

„BREAST CANCER GENE" polypeptide variants which are biologically active, i.e., retain an 
,3REAST CANCER GENE" activity, also are „BREAST CANCER GENE" polypeptides. 
Preferably, naturally or non-naturally occurring „BREAST CANCER GENE" polypeptide variants 
have amino acid sequences which are at least about 60, 65, or 70, preferably about 75, 80, 85, 90, 

10 92, 94, 96, or 98% identical to the any of the amino acid sequences of the polypeptides of SEQ ID 
NO: 27 to 52 or 76 to 98 or the polypeptides encoded by any of the polynucleotides of SEQ ID 
NO: 1 to 26 or 53 to 75 or a fragment thereof. Percent identity between a putative ,3REAST 
CANCER GENE" polypeptide variant and of the polypeptides of SEQ ID NO: 27 to 52 or 76 to 98 
or the polypeptides encoded by any of the polynucleotides of SEQ ID NO: 1 to 26 or 53 to 75 or a 

15 fragment thereof is determined by conventional methods. [See, for example, Altschul et al. 9 1986, 
(90 and Henikoff & Henikoff, 1992, (91)]. Briefly, two amino acid sequences are aligned to 
optimize the alignment scores using a gap opening penalty of 10, a gap extension penalty of 1, and 
the "BLOSUM62" scoring matrix of Henikoff & Henikoff, (91). 

Those skilled in the art appreciate that there are many established algorithms available to align two 
20 amino acid sequences. The 'TASTA" similarity search algorithm of Pearson & Lipman is a 
suitable protein alignment method for examining the level of identity shared by an amino acid 
sequence disclosed herein and the amino acid sequence of a putative variant [Pearson & Lipman, 
1988, (92), and Pearson, 1990, (93)]. Briefly, FASTA first characterizes sequence similarity by 
identifying regions shared by the query sequence (e.g., SEQ ID NO: 1 to 26 or 53 to 75) and a test 
25 sequence that have either the highest density of identities (if the ktup variable is 1) or pairs of 
identities (if ktup=2), without considering conservative amino acid substitutions, insertions, or 
deletions. The ten regions with the highest density of identities are then rescored by comparing the 
similarity of all paired amino acids using an amino acid substitution matrix, and the ends of the 
regions are "trimmed" to include only those residues that contribute to the highest score. If there 
30 are several regions with scores greater than the "cutoff" value (calculated by a predetermined 
formula based upon the length of the sequence the ktup value), then the trimmed initial regions are 
examined to determine whether the regions can be joined to form an approximate alignment with 
gaps. Finally, the highest scoring regions of the two amino acid sequences are aligned using a 
modification of the Needleman-Wunsch-Sellers algorithm [Needleman & Wunsch, 1970, (94), and 
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Sellers, 1974, (95)], which allows for amino acid insertions and deletions. Preferred parameters for 
FASTA analysis are: ktup=l, gap opening penalty=10, gap extension penalty=l, and substitution 
matrix=BLOSUM62. These parameters can be introduced into a FASTA program by modifying 
the scoring matrix file ("SMATRIX"), as explained in Appendix 2 of Pearson, (93). 

5 FASTA can also be used to determine the sequence identity of nucleic acid molecules using a ratio 
as disclosed above. For nucleotide sequence comparisons, the ktup Value can range between one 
to six, preferably from three to six, most preferably three, with other parameters set as default. 

Variations in percent identity can be due, for example, to amino acid substitutions, insertions, or 
deletions. Amino acid substitutions are defined as one for one amino acid replacements. They are 
10 conservative in nature when the substituted amino acid has similar structural and/or chemical 
properties. Examples of conservative replacements are substitution of a leucine with an isoleucine 
or valine, an aspartate with a glutamate, or a threonine with a serine. 

Amino acid insertions or deletions are changes to or within an amino acid sequence. They typically 
fall in the range of about 1 to 5 amino acids. Guidance in determining which amino acid residues 

15 can be substituted, inserted, or deleted without abolishing biological or immunological activity of a 
„BREAST CANCER GENE" polypeptide can be found using computer programs well known in 
the art, such as DNASTAR software. Whether an amino acid change results in a biologically active 
„BREAST CANCER GENE" polypeptide can readily be determined by assaying for „BREAST 
CANCER GENE" activity, as described for example, in the specific Examples, below. Larger 

20 insertions or deletions can also be caused by alternative splicing. Protein domains can be inserted 
or deleted without altering the main activity of the protein. 

Fusion Proteins 

Fusion proteins are useful for generating antibodies against „BREAST CANCER GENE" 
polypeptide amino acid sequences and for use in various assay systems. For example, fusion 
25 proteins can be used to identify proteins which interact with portions of a ,3REAST CANCER 
GENE" polypeptide. Protein affinity chromatography or library-based assays for protein-protein 
interactions, such as the yeast two-hybrid or phage display systems, can be used for this purpose. 
Such methods are well known in the art and also can be used as drug screens. 

A JBREAST CANCER GENE" polypeptide fusion protein comprises two polypeptide segments 
30 - fused together by means of a peptide bond. The first polypeptide segment comprises at least 25, 
50, 75, 100, 150, 200, 300, 400, 500, 600, 700 or 750 contiguous amino acids of an amino acid 
sequence encoded by any polynucleotide sequences of the SEQ ID NO: 1 to 26 or 53 to 75 or of a 



WO 2005/047534 



-68- 



PCT/EP2004/011599 



biologically active variant, such as those described above. The first polypeptide segment also can 
comprise full-length „BREAST CANCER GENE". 

The second polypeptide segment can be a full-length protein or a protein fragment. Proteins 
commonly used in fusion protein construction include p-galactosidase, P-glucuronidase, green 
5 fluorescent protein (GFP), autofluorescent proteins, including blue fluorescent protein (BFP), 
glutathione-S-transferase (GST), luciferase, horseradish peroxidase (HRP), and chloramphenicol 
acetyltransferase (CAT). Additionally, epitope tags are used in fusion protein constructions, 
including histidine (His) tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G 
tags, and thioredoxin (Trx) tags. Other fusion constructions can include maltose; binding protein 
10 (MBP), S-tag, Lex a DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, 
and herpes simplex virus (HSV) BP 16 protein fusions. A fusion protein also can be engineered to 
contain a cleavage site located between the „BREAST CANCER GENE" polypeptide-encoding 
sequence and the heterologous protein sequence, so that the „BREAST CANCER GENE" 
polypeptide can be cleaved and purified away from the heterologous moiety. 

15 A fusion protein can be synthesized chemically, as is known in the art Preferably, a fusion protein 
is produced by covalently linking two polypeptide segments or by standard procedures in the art of 
molecular biology. Recombinant DNA methods can be used to prepare fusion proteins, for 
example, by making a DNA construct which comprises coding sequences selected from any of the 
polynucleotide sequences of the SEQ ID NO: 1 to 26 and 53 to 75 in proper reading frame with 

20 nucleotides encoding the second polypeptide segment and expressing the DNA construct in a host 
cell, as is known in the art. Many kits for constructing fusion proteins are available from 
companies such as Promega Corporation (Madison, WI), Stratagene (La Jolla, CA), CLONTECH 
(Mountain View, CA), Santa Cruz Biotechnology (Santa Cruz, CA), MBL International 
Corporation (MIC; Watertown, MA), and Quantum Biotechnologies (Montreal, Canada; 1-888- 

25 DNA-KITS). 

Identification of Species Homologues 

Species homologues of human a , JBREAST CANCER GENE" polypeptide can be obtained using 
,JBREAST CANCER GENE" polypeptide polynucleotides (described below) to make suitable 
probes or primers for screening cDNA expression libraries from other species, such as mice, 
30 monkeys, or yeast, identifying cDNAs which encode homologues of a „BREAST CANCER 
GENE" polypeptide, and expressing the cDNAs as is known in the art. 
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To express a ,3REAST CANCER GENE" polynucleotide, the polynucleotide can be inserted into 
an expression vector which contains the necessary elements for the transcription and translation of 
the inserted coding sequence. Methods which are well known to those skilled in the art can be 
used to construct expression vectors containing sequences encoding „BREAST CANCER GENE" 
polypeptides and appropriate transcriptional and translational control elements. These methods 
include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic 
recombination. Such techniques are described, for example, in Sambrook et al., (77) and in 
Ausubel et al., (78). 

A variety of expression vector/host systems can be utilized to contain and express sequences 
encoding a „BREAST CANCER GENE" polypeptide. These include, but are not limited to, 
microorganisms, such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid 
DNA expression vectors; yeast transformed with yeast expression vectors, insect cell systems 
infected with virus expression vectors (e.g., baculovirus), plant cell systems transformed with virus 
expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or with 
bacterial expression vectors (e.g., Ti or pBR322 plasmids), or animal cell systems. 

The control elements or regulatory sequences are those regions of the vector enhancers, promoters, 
5' and 3 r untranslated regions which interact with host cellular proteins to carry out transcription 
and translation. Such elements can vary in their strength and specificity. Depending on the vector 
system and host utilized, any number of suitable transcription and translation elements, including 
constitutive and inducible promoters, can be used. For example, when cloning in bacterial systems, 
inducible promoters such as the hybrid lacZ promoter of the BLUESCRIPT phagemid (Stratagene, 
LaJolla, Calif.) or pSPORTl plasmid (Life Technologies) and the like can be used. The 
baculovirus polyhedrin promoter can be used in insect cells. Promoters or enhancers derived from 
the genomes of plant cells (e.g., heat shock, RUBISCO, and storage protein genes) or from plant 
viruses (e.g., viral promoters or leader sequences) can be cloned into the vector. In mammalian 
cell systems, promoters from mammalian genes or from mammalian viruses are preferable. If it is 
necessary to generate a cell line that contains multiple copies of a nucleotide sequence encoding a 
JBREAST CANCER GENE" polypeptide, vectors based on SV40 or EBV can be used with an 
appropriate selectable marker. 

Bacterial and Yeast Expression Systems 



In bacterial systems, a number of expression vectors can be selected depending upon the use 
intended for the JBREAST CANCER GENE" polypeptide. For example, when a large quantity of 
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the „BREAST CANCER GENE" polypeptide is needed for the induction of antibodies, vectors 
which direct high level expression of fusion proteins that are readily purified can be used. Such 
vectors include, but are not limited to, multifunctional E. coli cloning and expression vectors such 
as BLUESCRIPT (Stratagene). In a BLUESCRIPT vector, a sequence encoding the ,3REAST 
CANCER GENE" polypeptide can be ligated into the vector in frame with sequences for the 
amino terminal Met and the subsequent 7 residues of B-galactosidase so that a hybrid protein is 
produced. pIN vectors [Van Heeke & Schuster, (17)] or pGEX vectors (Promega, Madison, Wis.) 
also can be used to express foreign polypeptides as fusion proteins with glutathione S-transferase 
(GST). In general, such fusion proteins are soluble and can easily be purified from lysed cells by 
adsorption to glutathione agarose beads followed by elution in the presence of free glutathione. 
Proteins made in such systems can be designed to include heparin, thrombin, or factor Xa protease 
cleavage sites so that the cloned polypeptide of interest can be released from the GST moiety at 
will. 

In the yeast Saccharomyces cerevisiae, a number of vectors containing constitutive or inducible 
promoters such as alpha factor, alcohol oxidase, and PGH can be used. For reviews, see Ausubel et 
al., (4) and Grant et al., (1 8). 

Plant and Insect Expression Systems 

If plant expression vectors are used, the expression of sequences encoding ,3REAST CANCER 
GENE" polypeptides can be driven by any of a number of promoters. For example, viral promoters 
such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omega 
leader sequence from TMV [Takamatsu, 1987, (96)]. Alternatively, plant promoters such as the 
small subunit of RUBISCO or heat shock promoters can be used [Coruzzi et al., 1984, (97); 
Broglie et al., 1984, (98); Winter et al., 1991, (99)]. These constructs can be introduced into plant 
cells by direct DNA transformation or by pathogen-mediated transfection. Such techniques are 
described in a number of generally available reviews. 

An insect system also can be used to express a „BREAST CANCER GENE" polypeptide. For 
example, in one such system Autographa californica nuclear polyhedrosis virus (AcNPV) is used 
as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. 
Sequences encoding „BREAST CANCER GENE" polypeptides can be cloned into a nonessential 
region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin 
promoter. Successful insertion of „BREAST CANCER GENE" polypeptides will render the 
polyhedrin gene inactive and produce recombinant virus lacking coat protein. The recombinant 
viruses can then be used to infect S. frugiperda cells or Trichoplusia larvae in which „BREAST 
CANCER GENE" polypeptides can be expressed [Engelhard et al., 1994, (100)]. 
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A number of viral-based expression systems can be used to express „BREAST CANCER GENE" 
polypeptides in mammalian host cells. For example, if an adenovirus is used as an expression 
vector, sequences encoding „BKEAST CANCER GENE" polypeptides can be ligated into an 
5 adenovirus transcription/translation complex comprising the late promoter and tripartite leader 
sequence. Insertion in a nonessential El or E3 region of the viral genome can be used to obtain a 
viable virus which is capable of expressing a ,3REAST CANCER GENE" polypeptide in infected 
host cells [Logan & Shenk, 1984, (101)]. If desired, transcription enhancers, such as the Rous 
sarcoma virus (RSV) enhancer, can be used to increase expression in mammalian host cells. 

10 Human artificial chromosomes (HACs) also can be used to deliver larger fragments of DNA than 
can be contained and expressed in a plasmid HACs of 6M to 10M are constructed and delivered 
to cells via conventional delivery methods (e.g., liposomes, polycationic amino polymers, or 
vesicles). 

Specific initiation signals also can be used to achieve more efficient translation of sequences 
15 encoding „BREAST CANCER GENE" polypeptides. Such signals include the ATG initiation 
codon and adjacent sequences. In cases where sequences encoding a „BREAST CANCER GENE" 
polypeptide, its initiation codon, and upstream sequences are inserted into the appropriate 
expression vector, no additional transcriptional or translational control signals may be needed. 
However, in cases where only coding sequence, or a fragment thereof, is inserted, exogenous 
20 translational control signals (including the ATG initiation codon) should be provided. The 
initiation codon should be in the correct reading frame to ensure translation of the entire insert. 
Exogenous translational elements and initiation codons can be of various origins, both natural and 
synthetic. The efficiency of expression can be enhanced by the inclusion of enhancers which are 
appropriate for the particular cell system which is used [Scharf et al., 1994, (102)]. 

25 Host Cells 

A host cell strain can be chosen for its ability to modulate the expression of the inserted sequences 
or to process the expressed „BREAST CANCER GENE" polypeptide in the desired fashion. Such 
modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, 
glycosylation, phosphorylation, lipidation, and acylation. Posttranslational processing which 
30 cleaves a "prepro 11 form of the polypeptide also can be used to facilitate correct insertion, folding 
and/or function. Different host cells which have specific cellular machinery and characteristic 
mechanisms for Post-translational activities (e.g., CHO, HeLa, MDCK, HEK293, and WD 8), are 
available from the American Type Culture Collection (ATCC; 10801 University Boulevard, 
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Manassas, VA 201 10-2209) and can be chosen to ensure the correct modification and processing 
of the foreign protein. 

Stable expression is preferred for long-term, high-yield production of recombinant proteins. For 
example, cell lines which stably express „BREAST CANCER GENE" polypeptides can be 
5 transformed using expression vectors which can contain viral origins of replication and/or 
endogenous expression elements and a selectable marker gene on the same or on a separate vector. 
Following the introduction of the vector, cells can be allowed to grow for 12 days in an enriched 
medium before they are switched to a selective medium. The purpose of the selectable marker is to 
confer resistance to selection, and its presence allows growth and recovery of cells which 
10 successfully express the introduced „BREAST CANCER GENE" sequences. Resistant clones of 
stably transformed cells can be proliferated using tissue culture techniques appropriate to the cell 
type [Freshney etal., 1986, (103). 

Any number of selection systems can be used to recover transformed cell lines. These include, but 
are not limited to, the herpes simplex virus thymidine kinase (Wigler et al., 1977, (104)] and 

15 adenine phosphoribosyltransferase [Lowy et al., 1980, (105)] genes which can be employed in tk" 
or aprf cells, respectively. Also, antimetabolite, antibiotic, or herbicide resistance can be used as 
the basis for selection. For example, dhfr confers resistance to methotrexate [Wigler et al., 1980, 
(106)], npt confers resistance to the aminoglycosides, neomycin and G418 [Colbere-Garapin et al., 
1981, (107)], and als and pat confer resistance to chlorsulfuron and phosphinotricin 

20 acetyltransferase, respectively. Additional selectable genes have been described. For example, 
trpB allows cells to utilize indole in place of tryptophan, or hisD, which allows cells to utilize 
histinol in place of histidine [Hartman & Mulligan, 1988 ,(108)]. Visible markers such as 
anthocyanins, fl-glucuronidase and its substrate GUS, and luciferase and its substrate luciferin, can 
be used to identify transformants and to quantify the amount of transient or stable protein 

25 expression attributable to a specific vector system [Rhodes et al., 1995, (109)]. 

Detecting Expression and zene product 

Although the presence of marker gene expression suggests that the ,3REAST CANCER GENE" 
polynucleotide is also present, its presence and expression may need to be confirmed. For example, 
if a sequence encoding a „BREAST CANCER GENE" polypeptide is inserted within a marker 
30 gene sequence, transformed cells containing sequences which encode a „BREAST CANCER 
- - GENE" polypeptide can be identified by. the absence of marker gene function. Alternatively, a 
marker gene can be placed in tandem with a sequence encoding a „BREAST CANCER GENE" 
polypeptide under the control of a single promoter. Expression of the marker gene in response to 
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induction or selection usually indicates expression of the „BREAST CANCER GENE" 
polynucleotide. 

Alternatively, host cells which contain a ,3REAST CANCER GENE" polynucleotide and which 
express a ,3REAST CANCER GENE" polypeptide can be identified by a variety of procedures 
5 known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or 
DNA-RNA hybridization and protein bioassay or immunoassay techniques which include 
membrane, solution, or chip-based technologies for the detection and/or quantification of 
polynucleotide or protein. For example, the presence of a polynucleotide sequence encoding a 
,3REAST CANCER GENE" polypeptide can be detected by DNA-DNA or DNA-RNA 
10 hybridization or amplification using probes or fragments or fragments of polynucleotides encoding 
a ,3REAST CANCER GENE" polypeptide. Nucleic acid amplification-based assays involve the 
use of oligonucleotides selected from sequences encoding a ,3REAST CANCER GENE" 
polypeptide to detect transformants which contain a ,3REAST CANCER GENE" polynucleotide. 

A variety of protocols for detecting and measuring the expression of a ,3REAST CANCER 
15 GENE" polypeptide, using either polyclonal or monoclonal antibodies specific for the polypeptide, 
are known in the art. Examples include enzyme-linked immunosorbent assay (ELISA), 
radioimmunoassay (RIA), and fluorescence activated cell sorting (FACS). A two-site, monoclonal- 
based immunoassay using monoclonal antibodies reactive to two non-interfering epitopes on a 
,3REAST CANCER GENE" polypeptide can be used, or a competitive binding assay can be 
20 employed. These and other assays are described in Hampton et al., (1 10) and Maddox et al., 1 1 1). 

A wide variety of labels and conjugation techniques are known by those skilled in the art and can 
be used in various nucleic acid and amino acid assays. Means for producing labeled hybridization 
or PCR probes for detecting sequences related to polynucleotides encoding ,3REAST CANCER 
GENE" polypeptides include oligo labeling, nick translation, end-labeling, or PCR amplification 

25 using a labeled nucleotide. Alternatively, sequences encoding a ,3REAST CANCER GENE" 
polypeptide can be cloned into a vector for the production of an mRNA probe. Such vectors are 
known in the art, are commercially available, and can be used to synthesize RNA probes in vitro 
by addition of labeled nucleotides and an appropriate RNA polymerase such as T7, T3, or SP6. 
These procedures can be conducted using a variety of commercially available kits (Amersham 

30 Pharmacia Biotech, Promega, and US Biochemical). Suitable reporter molecules or labels which 
can be used for ease of detection include radionuclides, enzymes, and fluorescent, 
chemOummesMntTor chromogemc~agents," ^^ll^^ubstfates7 cofactors, inhibitors; - magneticr 
particles, and the like. 
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Expression and Purification of Polypeptides 

Host cells transformed with nucleotide sequences encoding a „BREAST CANCER GENE" 
polypeptide can be cultured under conditions suitable for the expression and recovery of the 
protein from cell culture. The polypeptide produced by a transformed cell can be secreted or 
5 stored intracellular depending on the sequence and/or the vector used. As will be understood by 
those of skill in the art, expression vectors containing polynucleotides which encode ,3REAST 
CANCER GENE" polypeptides can be designed to contain signal sequences which direct secretion 
of soluble ,3REAST CANCER GENE" polypeptides through a prokaryotic or eukaryotic cell 
membrane or which direct the membrane insertion of membrane-bound J3REAST CANCER 
10 GENE" polypeptide. 

As discussed above, other constructions can be used to join a sequence encoding a „BREAST 
CANCER GENE" polypeptide to a nucleotide sequence encoding a polypeptide domain which will 
facilitate purification of soluble proteins. Such purification facilitating domains include, but are 
not limited to, metal chelating peptides such as histidine-tryptophan modules that allow 

15 purification on immobilized metals, protein A domains that allow purification on immobilized 
immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system 
(Immunex Corp., Seattle, Wash.). Inclusion of cleavable linker sequences such as those specific 
for Factor Xa or enterokinase (Invitrogen, San Diego, CA) between the purification domain and 
the „BREAST CANCER GENE" polypeptide also can be used to facilitate purification. One such 

20 expression vector provides for expression of a fusion protein containing a „BREAST CANCER 
GENE" polypeptide and 6 histidine residues preceding a thioredoxin or an enterokinase cleavage 
site. The histidine residues facilitate purification by IMAC (immobilized metal ion affinity 
chromatography [Porath et al., 1992, (1 12)], while the enterokinase cleavage site provides a means 
for purifying the „BREAST CANCER GENE" polypeptide from the fusion protein. Vectors 

25 which contain fusion proteins are disclosed in Kroll et al., (1 13). 

Chemical Synthesis 

Sequences encoding a ,JBREAST CANCER GENE" polypeptide can be synthesized, in whole or 
in part, using chemical methods well known in the art (see Caruthers et al., (114) and Horn et al., 
(115). Alternatively, a „BREAST CANCER GENE" polypeptide itself can be produced using 
30 chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis using 
solid-phase techniques [Merrifield, 1963, (116) and Roberge et al., 1995, (117)]. Protein synthesis 
can be performed using manual techniques or by automation. Automated synthesis can be 
achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkdn Elmer). 
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Optionally, fragments of JBREAST CANCER GENE" polypeptides can be separately synthesized 
and combined using chemical methods to produce a full-length molecule. 

The newly synthesized peptide can be substantially purified by preparative high performance 
liquid chromatography [Creighton, 1983, (118)]. The composition of a synthetic ,JBREAST 
5 CANCER GENE" polypeptide can be confirmed by amino acid analysis or sequencing (e.g., the 
Edman degradation procedure; see Creighton, (118). Additionally, any portion of the amino acid 
sequence of the „BREAST CANCER GENE" polypeptide can be altered during direct synthesis 
and/or combined using chemical methods with sequences from other proteins to produce a variant 
polypeptide or a fusion protein. 

10 Production of Altered Polypeptides 

As will be understood by those of skill in the art, it may be advantageous to produce „BREAST 
CANCER GENE" polypeptide-encoding nucleotide sequences possessing non-natural occurring 
codons. For example, codons preferred by a particular prokaryotic or eukaryotic host can be 
selected to increase the rate of protein expression or to produce an RNA transcript having 
15 desirable properties, such as a half-life which is longer than that of a transcript generated from the 
naturally occurring sequence. 

The nucleotide sequences disclosed herein can be engineered using methods generally known in 
the art to alter „BREAST CANCER GENE" polypeptide-encoding sequences for a variety of 
reasons, including but not limited to, alterations which modify the cloning, processing, and/or 
20 expression of the polypeptide or mRNA product. DNA shuffling by random fragmentation and 
PGR re-assembly of gene fragments and synthetic oligonucleotides can be used to engineer the 
nucleotide sequences. For example, site-directed mutagenesis can be used to insert new restriction 
sites, alter glycosylation patterns, change codon preference, produce splice variants, introduce 
mutations, and so forth. 

25 Predictive. Diagnostic and Prognostic Assays 

The present invention provides method for determining whether a subject is at risk for developing 
malignant neoplasia and breast cancer in particular by detecting one of the disclosed 
polynucleotide markers comprising any of the polynucleotides sequences of the SEQ ID NO: 2 to 
6, 8, 9, 1 1 to 16, 18, 19 or 21 to 26 or 53 to 75 and/or the polypeptide markers encoded thereby or 
30 polypeptide markers comprising any of the polypeptide sequences of the SEQ ID NO: 28 to 32, 34, 
35, 37 to 42, 44, 45 or 47 to 52 or 76 to 98 or at least 2 of the disclosed polynucleotides selected 
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from SEQ ID NO: 1 to 26 and 53 to 75 or the at least 2 of the disclosed polypeptides selected from 
SEQ ID NO: 28 to 32 and 76 to 98 for malignant neoplasia and breast cancer in particular. 

In clinical applications, biological samples can be screened for the presence and/or absence of the 
biomarkers identified herein. Such samples are for example needle biopsy cores, surgical resection 
5 samples, or body fluids like serum, thin needle nipple aspirates and urine. For example, these 
methods include obtaining a biopsy, which is optionally fractionated by cryostat sectioning to 
enrich diseases cells to about 80% of the total cell population. In certain embodiments, 
polynucleotides extracted from these samples may be amplified using techniques well known in 
the art. The expression levels of selected markers detected would be compared with statistically 
10 valid groups of diseased and healthy samples. 

In one embodiment the diagnostic method comprises determining whether a subject has an 
abnormal mRNA and/or protein level of the disclosed markers, such as by Northern blot analysis, 
reverse transcription-polymerase chain reaction (RT-PCR), in situ hybridization, 
immunoprecipitation, Western blot hybridization, or immunohistochemistry. According to the 
15 method, cells are obtained from a subject and the levels of the disclosed biomarkers, protein or 
mRNA level, is determined and compared to the level of these markers in a healthy subject. An 
abnormal level of the biomarker polypeptide or mRNA levels is likely to be indicative of 
malignant neoplasia such as breast cancer. 

In another embodiment the diagnostic method comprises determining whether a subject has an 
20 abnormal DNA content of said genes or said genomic loci, such as by Southern blot analysis, dot 
blot analysis, fluorescence or colorimetric In Situ hybridization, comparative genomic 
hybridization, genotpying by VNTR, STS-PCR or quantitative PCR In general these assays 
comprise the usage of probes from representative genomic regions. The probes contain at least 
parts of said genomic regions or sequences complementary or analogous to said regions. In 
25 particular intra- or intergenic regions of said genes or genomic regions. The probes can consist of 
nucleotide sequences or sequences of analogous functions (e.g. PNAs, Morpholino oligomers) 
being able to bind to target regions by hybridization, hi general genomic regions being altered in 
said patient samples are compared with unaffected control samples (normal tissue from the same 
or different patients, surrounding unaffected tissue, peripheral blood) or with genomic regions of 
30 the same sample that don't have said alterations and can therefore serve as internal controls. In a 
preferred embodiment regions located on the same chromosome are used. Alternatively, 
gra^om"al"regions~Mid /of regions withrdefined varying amount in the" sample are used, hrone" " 
favored embodiment the DNA content, structure, composition or modification is compared that lie 
within distinct genomic regions. Especially favored are methods that detect the DNA content of 
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said samples, where the amount of target regions are altered by amplification and or deletions. In 
another embodiment the target regions are analyzed for the presence of polymorphisms (e.g. Single 
Nucleotide Polymorphisms or mutations) feat affect or predispose the cells in said samples with 
regard to clinical aspects, being of diagnostic, prognostic or therapeutic value. Preferably, the 
5 identification of sequence variations is used to define haplotypes feat result in characteristic 
behavior of said samples with said clinical aspects. 

The following examples of genes in 17ql2-21.2 are offered by way of illustration, not by way of 
limitation. 

One embodiment of the invention is a method for fee prediction, diagnosis or prognosis of 
10 malignant neoplasia by the detection of at leastlO, at least 5, or at least 4, or at least 3 and more 
preferably at least 2 markers whereby the markers are genes and fragments thereof and/or genomic 
nucleic acid sequences that are located on one chromosomal region which is altered in malignant 
neoplasia. 

One further embodiment of the invention is method for the prediction, diagnosis or prognosis of 
15 malignant neoplasia by the detection of at least 10, at least 5, or at least 4, or at least 3 and more 
preferably at least 2 markers whereby the markers (a) are genes and fragments thereof and/or 
genomic nucleic acid sequences that are located on one or more chromosomal region(s) which 
is/are altered in malignant neoplasia and (b) functionally interact as (i) receptor and ligand or (ii) 
members of the same signal transduction pathway or (iii)members of synergistic signal 
20 transduction pathways or (iv) members of antagonistic signal transduction pathways or (v) 
transcription factor and transcription factor binding site. 

Li one embodiment, the method for the prediction, diagnosis or prognosis of malignant neoplasia 
and breast cancer in particular is done by the detection of: 

(a) polynucleotide selected from the polynucleotides of the SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 
25 18, 19,21 to 26 or 53 to 75; 

(b) a polynucleotide which hybridizes under stringent conditions to a polynucleotide specified 
in (a) encoding a polypeptide exhibiting the same biological function as specified for fee 
respective sequence in Table 2 or 3; 

a polynucleotide the sequence of which deviates from the polynucleotide specified in (a) 
and (b) due to the generation of the genetic code encoding a polypeptide exhibiting fee 
same biological function as specified for the respective sequence in Table 2 or 3; 



(c) 

30 
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(d) a polynucleotide which represents a specific fragment, derivative or allelic variation of a 
polynucleotide sequence specified in (a) to (c); 

in a biological sample comprising the following steps: hybridizing any polynucleotide or 
analogous oligomer specified in (a) to (do) to a polynucleotide material of a biological sample, 
5 thereby forming a hybridization complex; and detecting said hybridization complex. 

In another embodiment the method for the prediction, diagnosis or prognosis of malignant 
neoplasia is done as just described but, wherein before hybridization, the polynucleotide material 
of the biological sample is amplified. 

In another embodiment the method for the diagnosis or prognosis of malignant neoplasia and 
10 breast cancer in particular is done by the detection of: 

(a) a polynucleotide selected from the polynucleotides of the SEQ ID NO: 2 to 6, 8, 9, 1 1 to 
16,18, 19,21 to 26 or 53 to 75; 

(b) a polynucleotide which hybridizes under stringent conditions to a polynucleotide specified 
in (a) encoding a polypeptide exhibiting the same biological function as specified for the 

15 respective sequence in Table 2 or 3; 

(c) a polynucleotide the sequence of which deviates from the polynucleotide specified in (a) 
and (b) due to the generation of the genetic code encoding a polypeptide exhibiting the 
same biological function as specified for the respective sequence in Table 2 or 3; 

(d) a polynucleotide which represents a specific fragment, derivative or allelic variation of a 
20 polynucleotide sequence specified in (a) to (c); 

(e) a polypeptide encoded by a polynucleotide sequence specified in (a) to (d) 

(f) a polypeptide comprising any polypeptide of SEQ ID NO: 28 to 32, 34, 35, 37 to 42, 44, 
45, 47 to 52 or 76 to 98; 

comprising the steps of contacting a biological sample with a reagent which specifically interacts 
25 with the polynucleotide specified in (a) to (d) or the polypeptide specified in (e). 

DNA array technology 

In one embodiment, the present Invention also provides a method wherein polynucleotide probes 
are immobilized an a DNA chip in an organized array. Oligonucleotides can be bound to a solid 
Support by a variety of processes, including lithography. For example a chip can hold up to 
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4100,00 oligonucleotides (GeneChip, Affymetrix). The present invention provides significant 
advantages over the available tests for malignant neoplasia, such as breast cancer, because it 
increases the reliability of the test by providing an array of polynucleotide markers an a single 
chip. 

5 The method includes obtaining a biopsy of an affected person, which is optionally fractionated by 
cryostat sectioning to enrich diseased cells to about 80% of the total cell population and the use of 
body fluids such as serum or urine, serum or cell containing liquids (e.g. derived from fine needle 
aspirates). The DNA or KNA is then extracted, amplified, and analyzed with a DNA chip to 
determine the presence of absence of the marker polynucleotide sequences. In one embodiment, 
10 the polynucleotide probes are spotted onto a substrate in a two-dimensional matrix or airay. 
samples of polynucleotides can be labeled and then hybridized to the probes. Double-stranded 
polynucleotides, comprising the labeled sample polynucleotides bound to probe polynucleotides, 
can be detected once the unbound portion of the sample is washed away. 

The probe polynucleotides can be spotted an substrates including glass, nitrocellulose, etc. The 
15 probes can be bound to the Substrate by either covalent bonds or by non-specific interactions, such 
as hydrophobic interactions. The sample polynucleotides can be labeled using radioactive labels, 
fluorophores, chromophores, etc. Techniques for constructing arrays and methods of using these 
arrays are described in EP 0 799 897; WO 97/29212; WO 97/27317; EP 0 785 280; WO 97/02357; 
U.S. Pat. No. 5,593,839; U.S. Pat. No. 5,578,832; EP 0 728 520; U.S. Pat. No. 5,599,695; EP 0 721 
20 016; U.S. Pat. No. 5,556,752; WO 95/22058; and U.S. Pat. No. 5,631,734. Further, arrays can be 
used to examine differential expression of genes and can be used to determine gene function. For 
example, arrays of the instant polynucleotide sequences can be used to determine if any of the 
polynucleotide sequences are differentially expressed between normal cells and diseased cells, for 
example. High expression of a particular message m a diseased sample, which is not observed in a 
25 corresponding normal sample, can indicate a breast cancer specific protein. 

Accordingly, in one aspect, the invention provides probes and primers that are specific to the 
unique polynucleotide markers disclosed herein. 

In one embodiment, the method comprises using a polynucleotide probe to determine the presence 
of malignant or breast cancer cells in particular in a tissue from a patient. Specifically, the method 
30 comprises: 

1) providing a polynucleotide probe comprising a nucleotide sequence at least 12 nucleotides 
in length, preferably at least 15 nucleotides, more preferably, 25 nucleotides, and most 
preferably at least 40 nucleotides, and up to all or nearly all of the coding sequence which 
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is complementary to a portion of the coding sequence of a polynucleotide selected from the 
polynucleotides of SEQ ID NO: 1 to 26 and 53 to 75 or a sequence complementary thereto 
and is 

2) differentially expressed in malignant neoplasia, such as breast cancer; 

3) obtaining a tissue sample from a patient with malignant neoplasia; 

4) providing a second tissue sample from a patient with no malignant neoplasia; 

5) contacting the polynucleotide probe under stringent conditions with RNA of each of said 
first and second tissue samples (e.g., in a Northern blot or in situ hybridization assay); and 

6) comparing (a) the amount of hybridization of the probe with RNA of the first tissue 
sample, with (b) the amount of hybridization of the probe with RNA of the second tissue 
sample; 

wherein a statistically significant difference in the amount of hybridization with the RNA of the 
first tissue sample as compared to the amount of hybridization with the RNA of the second tissue 
sample is indicative of malignant neoplasia and breast cancer in particular in the first tissue 
sample. 

Data analysis methods 

Comparison of the expression levels of one or more "BREAST CANCER GENES" with reference 
expression levels, e.g., expression levels in diseased cells of breast cancer or in normal counterpart 
cells, is preferably conducted using computer systems. In one embodiment, expression levels are 
obtained in two cells and these two sets of expression levels are introduced into a computer system 
for comparison. In a preferred embodiment, one set of expression levels is entered into a computer 
system for comparison with values that are already present in the computer system, or in computer- 
readable form that is then entered into the computer system. 

In one embodiment, the invention provides a computer readable form of the gene expression 
profile data of the invention, or of values corresponding to the level of expression of at least one 
"BREAST CANCER GENE" in a diseased cell. The values can be mRNA expression levels 
obtained from experiments, e.g., nricroarray analysis. The values can also be mRNA levels 
normalised relative, to a reference gene whose expression is constant in numerous cells under - 
numerous conditions, e.g., GAPDH. In other embodiments, the values in the computer are ratios 
of, or differences between, normalized or non-normalized mRNA levels in different samples. 
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Tbe gene expression profile data can be in the form of a table, such as an Excel table. The data can 
be alone, or it can be part of a larger database, e.g., comprising other expression profiles. For 
example, the expression profile data of the invention can be part of a public database. The 
computer readable form can be in a computer. In another embodiment, the invention provides a 
5 computer displaying the gene expression profile data. 

In one embodiment, the invention provides a method for determining the similarity between the 
level of expression of one or more "BREAST CANCER GENES" in a first cell, e.g., a cell of a 
subject, and that in a second cell, comprising obtaining the level of expression of one or more 
"BREAST CANCER GENES" in a first cell and entering these values into a computer comprising 
10 a database including records comprising values corresponding to levels of expression of one or 
more "BREAST CANCER GENES" in a second cell, and processor instructions, e.g., a user 
interface, capable of receiving a selection of one or more values for comparison purposes with data 
that is stored in the computer. The computer may further comprise a means for converting the 
comparison data into a diagram or chart or other type of output 

In another embodiment, values representing expression levels of "BREAST CANCER GENES" 
are entered into a computer system, comprising one or more databases with reference expression 
levels obtained from more than one cell. For example, the computer comprises expression data of 
diseased and normal cells. Instructions are provided to the computer, and the computer is capable 
of comparing the data entered with the data in the computer to determine whether the data entered 
is more similar to that of a normal cell or of a diseased cell. 

In another embodiment, the computer comprises values of expression levels in cells of subjects at 
different stages of breast cancer, and the computer is capable of comparing expression data entered 
into the computer with the data stored, and produce results indicating to which of the expression 
profiles in the computer, the one entered is most similar, such as to determine the stage of breast 
25 cancer in the subject 

In yet another embodiment, the reference expression profiles in the computer are expression 
profiles from cells of breast cancer of one or more subjects, which cells are treated in vivo or in 
vitro with a drug used for therapy of breast cancer. Upon entering of expression data of a cell of a 
subject treated in vitro or in vivo with the drug, the computer is instructed to compare the data 
30 entered to the data in the computer, and to provide results indicating whether the expression data 
input into the computer are more similar to those of a cell of a subject that is responsive to the drug 
or more similar to those of a cell of a subject that is not responsive to the drug. Thus, the results 
indicate whether the subject is likely to respond to the treatment with the drug or unlikely to 
respond to it 
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In one embodiment, the invention provides a system that comprises a means for receiving gene 
expression data for one or a plurality of genes; a means for comparing the gene expression data 
from each of said one or plurality of genes to a common reference frame; and a means for 
presenting the results of the comparison. This system may further comprise a means for clustering 
5 the data. 

In another embodiment, the invention provides a computer program for analyzing gene expression 
data comprising (i) a computer code that receives as input gene expression data for a plurality of 
genes and (ii) a computer code that compares said gene expression data from each of said plurality 
of genes to a common reference frame. 

10 The invention also provides a machine-readable or computer-readable medium including program 
instructions for performing the following steps: (i) comparing a plurality of values corresponding 
to expression levels of one or more genes characteristic of breast cancer in a query cell with a 
database including records comprising reference expression or expression profile data of one or 
more reference cells and an annotation of the type of cell; and (ii) indicating to which cell the 

15 query cell is most similar based on similarities of expression profiles. The reference cells can be 
cells from subjects at different stages of breast cancer. The reference cells can also be cells from 
subjects responding or not responding to a particular drug treatment and optionally incubated in 
vitro or in vivo with the drug. 

The reference cells may also be cells from subjects responding or not responding to several 
20 different treatments, and the computer system indicates a preferred treatment for the subject. 
Accordingly, the invention provides a method for selecting a therapy for a patient having breast 
cancer, the method comprising: (i) providing the level of expression of one or more genes 
characteristic of breast cancer in a diseased cell of the patient; (ii) providing a plurality of 
reference profiles, each associated with a therapy, wherein the subject expression profile and each 
25 reference profile has a plurality of values, each value representing the level of expression of a gene 
characteristic of breast cancer; and (iii) selecting the reference profile most similar to the subject 
expression profile, to thereby select a therapy for said patient. In a preferred embodiment step (iii) 
is performed by a computer. The most similar reference profile may be selected by weighing a 
comparison value of the plurality using a weight value associated with the corresponding 
30 expression data. 

The relative abundance of an mRNA in two biological samples can be scored as a perturbation and 
its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested), or 
as not perturbed (i.e., the relative abundance is the same). In various embodiments, a difference 
between the two sources of KNA of at least a factor of about 25% (RNA from one source is 25% 
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more abundant in one source than the other source), more usually about 50%, even more often by a 
factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) is 
scored as a perturbation. Perturbations can be used by a computer for calculating and expression 
comparisons. 

5 Preferably, in addition to identifying a perturbation as positive or negative, it is advantageous to 
determine the magnitude of the perturbation. This can be carried out, as noted above, by 
calculating the ratio of the emission of the two fluorophores used for differential labeling, or by 
analogous methods that will be readily apparent to those of skill in the art 

The computer readable medium may further comprise a pointer to a descriptor of a stage of breast 
1 0 cancer or to a treatment for breast cancer. 

In operation, the means for receiving gene expression data, the means for comparing the gene 
expression data, the means for presenting, the means for normalizing, and the means for clustering 
within the context of the systems of the present invention can involve a programmed computer 
with the respective functionalities described herein, implemented in hardware or hardware and 
15 software; a logic circuit or other component of a programmed computer that performs the 
operations specifically identified herein, dictated by a computer program; or a computer memory 
encoded with executable instructions representing a computer program that can cause a computer 
to function in the particular fashion described herein. 

Those skilled in the art will understand that the systems and methods of the present invention may 
20 be applied to a variety of systems, including IBM-compatible personal computers running MS- 
DOS or Microsoft Windows. 

The computer may have internal components linked to external components. The internal 
components may include a processor element interconnected with a main memory. The computer 
system can be an Intel Pentium®-based processor of 200 MHz or greater clock rate and with 32 

25 MB or more of main memory. The external component may comprise a mass storage, which can be 
one or more hard disks (which are typically packaged together with the processor and memory). 
Such hard disks are typically of 1 GB or greater storage capacity. Other external components 
include a user interface device, which can be a monitor, together with an inputing device, which 
can be a "mouse", or other graphic input devices, and/or a keyboard. A printing device can also be 

30 attached to the computer. 

Typically, the computer system is also linked to a network link, which can be part of an Ethernet 
link to other local computer systems, remote computer systems, or wide area communication 
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networks, such as the Internet This network link allows the computer system to share data and 
processing tasks with other computer systems. 

Loaded into memory during operation of this system are several software components, which are 
both standard in the art and special to the instant invention. These software components 
5 collectively cause the computer system to function according to the methods of this invention. 
These software components are typically stored on a mass storage. A software component 
represents the operating system, which is responsible for managing the computer system and its 
network interconnections. This operating system can be, for example, of the Microsoft Windows' 
family, such as Windows 95, Windows 98, or Windows NT. A software component represents 

10 common languages and functions conveniently present on this system to assist programs 
implementing the methods specific to this invention. Many high or low level computer languages 
can be used to program the analytic methods of this invention. Instructions can be interpreted 
during run-time or compiled. Preferred languages include C/C++, and JAVA®. Most preferably, 
the methods of this invention are programmed in mathematical software packages which allow 

15 symbolic entry of equations and high-level specification of processing, including algorithms to be 
used, thereby freeing a user of the need to procedurally program individual equations or 
algorithms. Such packages include Matlab from Mathworks (Natick, Mass.), Mathematica from 
Wolfram Research (Champaign, 111.), or S-Plus from Math Soft (Cambridge, Mass.). Accordingly, 
a software component represents the analytic methods of this invention as programmed in a 

20 procedural language or symbolic package. In a preferred embodiment, the computer system also 
contains a database comprising values representing levels of expression of one or more genes 
characteristic of breast cancer. The database may contain one or more expression profiles of genes 
characteristic of breast cancer in different cells. 

In an exemplary implementation, to practice the methods of the present invention, a user first loads 
25 expression profile data into the computer system. These data can be directly entered by the user 
from a monitor and keyboard, or from other computer systems linked by a network connection, or 
on removable storage media such as a CD-ROM or floppy disk or through the network. Next the 
user causes execution of expression profile analysis software which performs the steps of 
comparing and, e.g., clustering co-varying genes into groups of genes. 

30 In another exemplary implementation, expression profiles are compared using a method described 
in U.S. Patent No. 6,203,987. A user first loads expression profile data into the computer system. 
Geneset profile definitions are loaded into the memory from the storage media or from a remote 
computer, preferably from a dynamic geneset database system, through the network. Next the user 
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causes execution of projection software which performs the steps of converting expression profile 
to projected expression profiles. The projected expression profiles are then displayed. 

In yet another exemplary implementation, a user first leads a projected profile into the memory. 
The user then causes the loading of a reference profile into the memory. Next, the user causes the 
5 execution of comparison software which performs the steps of objectively comparing the profiles. 

Detection of variant polynucleotide sequence 

In yet another embodiment, the invention provides methods for determining whether a subject is at 
risk for developing a disease, such as a predisposition to develop malignant neoplasia, for example 
breast cancer, associated with an aberrant activity of any one of the polypeptides encoded by any 
10 of the polynucleotides of the SEQ ID NO: 1 to 26 or 53 to 75, wherein the aberrant activity of the 
polypeptide is characterized by detecting the presence or absence of a genetic lesion characterized 
by at least one of these: 

(i) an alteration affecting the integrity of a gene encoding a marker polypeptides, or 

(ii) the misexpression of the encoding polynucleotide. 

15 To illustrate, such genetic lesions can be detected by ascertaining the existence of at least one of 
these: 

I a deletion of one or more nucleotides from the polynucleotide sequence 
n. an addition of one or more nucleotides to the polynucleotide sequence 
IE. a substitution of one or more nucleotides of the polynucleotide sequence 
20 IV. a gross chromosomal rearrangement of the polynucleotide sequence 

V. a gross alteration in the level of a messenger RNA transcript of the polynucleotide 
sequence 

VI. aberrant modification of the polynucleotide sequence, such as of the methylation pattern of 
the genomic DNA 

25 VII. the presence of a non-wild type splicing pattern of a messenger RNA transcript of the gene 
VIE a non-wild type level of the marker polypeptide 
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X. allelic gain of the gene 

XI. inappropriate post-translational modification of the marker polypeptide 

The present Invention provides assay techniques for detecting mutations in the encoding 
polynucleotide sequence. These methods include, but are not limited to, methods involving 
5 sequence analysis, Southern blot hybridization, restriction enzyme site mapping, and methods 
involving detection of absence of nucleotide pairing . between the polynucleotide to be analyzed 
and a probe. 

Specific diseases or disorders, e.g., genetic diseases or disorders, are associated with specific 
allelic variants of polymorphic regions of certain genes, which do not necessarily encode a mutated 

10 protein. Thus, the presence of a specific allelic variant of a polymorphic region of a gene in a 
subject can render the subject susceptible to developing a specific disease or disorder. 
Polymorphic regions in genes, can be identified, by determining the nucleotide sequence of genes 
in populations of individuals. If a polymorphic region is identified, then the link with a specific 
disease can be determined by studying specific populations of individuals, e.g. individuals which 

15 developed a specific disease, such as breast cancer. A polymorphic region can be located in any 
region of a gene, e.g., exons, in coding or non coding regions of exons, introns, and promoter 
region. 

Li an exemplary embodiment, there is provided a polynucleotide composition comprising a 
polynucleotide probe including a region of nucleotide sequence which is capable of hybridising to 

20 a sense or antisense sequence of a gene or naturally occurring mutants thereof, or 5' or 3' flanking 
sequences or intronic sequences naturally associated with the subject genes or naturally occurring 
mutants thereof. The polynucleotide of a cell is rendered accessible for hybridization, the probe is 
contacted with the polynucleotide of the sample, and the hybridization of the probe to the sample 
polynucleotide is detected. Such techniques can be used to detect lesions or allelic variants at 

25 either the genomic or mRNA level, including deletions, substitutions, etc., as well as to determine 
mRNA transcript levels. 

A preferred detection method is allele specific hybridization using probes overlapping the mutation 
or polymorphic site and having about 5, 10, 20, 25, or 30 nucleotides around the mutation or 
polymorphic region. In a preferred embodiment of the invention, several probes capable of 
30 hybridising specifically to allelic variants are attached to a solid phase support, e.g., a "chip". 
Mutation detectionanalysis using "thes^chiprcomprfsuig oligonucleotides/ also termed "DNA 
probe arrays" is described e.g., in Cronin et al. (119). In one embodiment, a chip comprises all fee 
allelic variants of at least one polymorphic region of a gene. The solid phase support is then 
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contacted with a test polynucleotide and hybridization to the specific probes is detected. 
Accordingly, the identity of numerous allelic variants of one or more genes can be identified in a 
simple hybridization experiment. 

In certain embodiments, detection qf the lesion comprises utilizing the probe/primer in a 
5 polymerase chain reaction (PCR) (see, e.g. U.S. Patent Nos. 4,683,195 and 4,683,202), such as 
anchor PCR or RACE PCR, or, alternatively, in a ligase chain reaction (LCR) [Landegran et aL, 
1988, (120) and Nakazawa et al., 1994 (121)], the latter of which can be particularly useful for 
detecting point mutations in the gene; Abravaya et al, 1995 ,(122)]. In a merely illustrative 
embodiment, the method includes the steps of (i) collecting a sample of cells from a patient, (ii) 

10 isolating polynucleotide (e.g., genomic, mRNA or both) from the cells of the sample, (iii) 
contacting the polynucleotide sample with one or more primers which specifically hybridize to a 
polynucleotide sequence under conditions such that hybridization and amplification of the 
polynucleotide (if present) occurs, and (iv) detecting the presence or absence of an amplification 
product or detecting the size of the amplification product and comparing the length to a control 

15 sample. It is anticipated that PCR and/or LCR may be desirable to use as a preliminary 
amplification step in conjunction with any of the techniques used for detecting mutations described 
herein. 

Alternative amplification methods include: self sustained sequence replication [Guatelli, J.C. et al., 
1990, (123)], transcriptional amplification system [Kwoh, D.Y. et aL, 1989, (124)], Q-Beta 
20 replicase [Lizardi, P.M. et al., 1988 ,(125)], or any other polynucleotide amplification method, 
followed by the detection of the amplified molecules using techniques well known to those of skill 
in the art. These detection schemes are especially useful for the detection of polynucleotide 
molecules if such molecules are present in very low numbers. 

In a preferred embodiment of the subject assay, mutations in, or allelic variants, of a gene from a 
25 sample cell are identified by alterations in restriction enzyme cleavage patterns. For example, 
sample and control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis. Moreover; the use 
of sequence specific ribozymes (see, for example, U.S. Patent No. 5,498,531) can be used to score 
for the presence of specific mutations by development or loss of a ribozyme cleavage site. 

30 In situ hybridization 

In one aspect, the method comprises in situ hybridization with a probe derived from a given marker 
polynucleotide, which sequence is selected from any of the polynucleotide sequences of the SEQ 
ID NO: 1 to 9, or 11 to 19 or 21 to 26 and 53 to 75 or a sequence complementary thereto. The 
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method comprises contacting the labeled hybridization probe with a sample of a given type of 
tissue from a patient potentially having malignant neoplasia and breast cancer in particular as well 
as normal tissue from a person with no malignant neoplasia, and determining whether the probe 
labels tissue of the patient to a degree significantly different (e.g., by at least a factor of two, or at 
5 least a factor of five, or at least a factor of twenty, or at least a factor of fifty) than the degree to 
which normal tissue is labelled. 

Polypeptide detection 

The subject invention further provides a method of determining whether a cell sample obtained 
from a subject possesses an abnormal amount of marker polypeptide which comprises (a) 

10 obtaining a cell sample from the subject, (b) quantitatively determining the amount of the marker 
polypeptide in the sample so obtained, and (c) comparing the amount of the marker polypeptide so 
determined with a known standard, so as to thereby determine whether the cell sample obtained 
from the subject possesses an abnormal amount of the marker polypeptide. Such marker 
polypeptides may be detected by immunohistochemical assays, dot-blot assays, ELBA and the 

15 like. 

Antibodies 

Any type of antibody known in the art can be generated to bind specifically to an epitope of a 
„BREAST CANCER GENE" polypeptide. An antibody as used herein includes intact immuno- 
globulin molecules, as well as fragments thereof, such as Fab, F(ab) 2 , arid Fv, which are capable of 
20 binding an epitope of a ,3REAST CANCER GENE" polypeptide. Typically, at least 6, 8, 10, or 
12 contiguous amino acids are required to form an epitope. However, epitopes which involve non- 
contiguous amino acids may require more, e.g., at least 15, 25, or 50 amino acids. 

An antibody which specifically binds to an epitope of a „BREAST CANCER GENE" polypeptide 
can be used therapeutically, as well as in immunochemical assays, such as Western blots, ELISAs, 

25 radioimmunoassays, immunohistochemical assays, immunoprecipitations, or other 
immunochemical assays known in the art. Various immunoassays can be used to identify 
antibodies having the desired specificity. Numerous protocols for competitive binding or 
immunoradiometric assays are well known in the art. Such immunoassays typically involve the 
measurement of complex formation between an immunogen and an antibody which specifically 

30 binds to the immunogen. 

Typically, an antibody which specifically binds to a „BREAST CANCER GENE" polypeptide 
provides a detection signal at least 5-, 10-, or 20-fold higher than a detection signal provided with 
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other proteins when used in an immunochemical assay. Preferably, antibodies which specifically 
bind to ,3REAST CANCER GENE" polypeptides do not detect other proteins in immunochemical 
assays and can immunoprecipitate a „BREAST CANCER GENE" polypeptide from solution. 

„BREAST CANCER GENE" polypeptides can be used to immunize a mammal, such as a mouse, 
5 rat, rabbit, guinea pig, monkey, or human, to produce polyclonal antibodies. If desired, a 
,3REAST CANCER GENE" polypeptide can be conjugated to a carrier protein, such as bovine 
serum albumin, thyroglobulin, and keyhole limpet hemocyanin. Depending on the host species, 
various adjuvants can be used to increase the immunological response. Such adjuvants include, 
but are not limited to, Freund's adjuvant, mineral gels (e.g., aluminum hydroxide), and surface 
10 active substances (e.g. lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole 
limpet hemocyanin, and dinitrophenol). Among adjuvants used in humans, BCG (bacilli Calmette- 
Guerin) and Corynebacterium parvum are especially useful. 

Monoclonal antibodies which specifically bind to a ,3REAST CANCER GENE" polypeptide can 
be prepared using any technique which provides for the production of antibody molecules by 
15 continuous cell lines in culture. These techniques include, but are not limited to, the hybridoma 
technique, the human B cell hybridoma technique, and the EBV hybridoma technique [Kohler et 
al, 1985, (136); Kozbor et al, 1985, (137); Cote et al, 1983, (138) and Cole et al, 1984, (139)]. 

In addition, techniques developed for the production of chimeric antibodies, the splicing of mouse 
antibody genes to human antibody genes to obtain a molecule with appropriate antigen specificity 

20 and biological activity, can be used [Morrison et al, 1984, (140); Neuberger et al., 1984, (141); 
Takeda et al., 1985, (142)]. Monoclonal and other antibodies also can be humanized to prevent a 
patient from mounting an immune response against the antibody when it is used therapeutically. 
Such antibodies may be sufficiently similar in sequence to human antibodies to be used directly in 
therapy or may require alteration of a few key residues. Sequence differences between rodent 

25 antibodies and human sequences can be minimized by replacing residues which differ from those 
in the human sequences by site directed mutagenesis of individual residues or by grating of entire 
complementarity determining regions. Alternatively, humanized antibodies can be produced using 
recombinant methods, as described in GB2188638B. Antibodies which specifically bind to a 
„BREAST CANCER GENE" polypeptide can contain antigen binding sites which are either 

30 partially or fully humanized, as disclosed in U.S. Patent 5,565,332. 

Alternatively, techniques described for the production of single chain antibodies can be adapted 

using methods known in the art to produce single chain antibodies which specifically bind to 
„BREAST CANCER GENE" polypeptides. Antibodies with related specificity, but of distinct 
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idiotypic composition, can be generated by chain shuffling from random combinatorial 
immunoglobulin libraries [Burton, 1991, (143)]. 

Single-chain antibodies also can be constructed using a DNA amplification method, such as PCR, 
using hybridoma cDNA as a template [Thirion et al., 1996, (144)]. Single-chain antibodies can be 
5 mono- or bispecific, and can be bivalent or tetravalent Construction of tetravalent, bispecific 
single-chain antibodies is taught, for example, in Coloma & Morrison, (145). Construction of 
bivalent, bispecific single-chain antibodies is taught in Mallender & Voss, (146). 

A nucleotide sequence encoding a single-chain antibody can be constructed using manual or 
automated nucleotide synthesis, cloned into an expression construct using standard recombinant 
10 DNA methods, and introduced into a cell to express the coding sequence, as described below. 
Alternatively, single-chain antibodies can be produced directly using, for example, filamentous 
phage technology [Verhaar et al., 1995, (147); Nicholls et al., 1993, (148)]. 

Antibodies which specifically bind to „BREAST CANCER GENE" polypeptides also can be 
produced by inducing in vivo production in the lymphocyte population or by screening 
15 immunoglobulin libraries or panels of highly specific binding reagents as disclosed in the literature 
[Orlandi et al., 1989, (149) and Winter et al., 1991, (150)]. 

Other types of antibodies can be constructed and used therapeutically in methods of the invention. 
For example, chimeric antibodies can be constructed as disclosed in WO 93/03151. Binding 
proteins which are derived from immunoglobulins and which are multivalent and multispecific, 
20 such as the antibodies described in WO 94/13804, also can be prepared. 

Antibodies according to the invention can be purified by methods well known in the art. For 
example, antibodies can be affinity purified by passage over a column to which a „BREAST 
CANCER GENE" polypeptide is bound. The bound antibodies can then be eluted from the column 
using a buffer with a high salt concentration. 

25 Immunoassays are commonly used to quantify the levels of proteins in cell samples, and many 
other immunoassay techniques are known in the art. The invention is not limited to a particular 
assay procedure, and therefore is intended to include both homogeneous and heterogeneous 
procedures. Exemplary immunoassays which can be conducted according to the invention include 
fluorescence polarisation immunoassay (FPIA), fluorescence immunoassay (FIA), enzyme 

- 30 immunoassay (EIA), nephelometric inhibition immunoassay (NIA), enzyme linked immunosorbent 

assay (ELISA), and radioimmunoassay (RIA). An indicator moiety, or label group, can be attached 
to the subject antibodies and is selected so as to meet the needs of various uses of the method 
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which are often dictated by the availability of assay equipment and compatible immunoassay 
procedures. General techniques to be used in performing the various immunoassays noted above 
are known to those of ordinary skill in the art 

In another embodiment, the level of at least one product encoded by any of the polynucleotide 
5 sequences of the SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19 or 21 to 26 or 53 to 75 or of at least 2 
products encoded by a polynucleotide selected from SEQ ID NO: 1 to 26 and 53 to 75 or a 
sequence complementary thereto, in a biological fluid (e.g., blood or urine) of a patient may be 
determined as a way of monitoring the level of expression of the marker polynucleotide sequence 
in cells of that patient. Such a method would include the steps of obtaining a sample of a biological 
• 10 fluid from the patient, contacting the sample (or proteins from the sample) with an antibody 
specific for a encoded marker polypeptide, and determining the amount of immune complex 
formation by the antibody, with the amount of immune complex formation being indicative of the 
level of the marker encoded product in the sample. This determination is particularly instructive 
when compared to the amount of immune complex formation by the same antibody in a control 
15 sample taken from a normal individual or in one or more samples previously or subsequently 
obtained from the same person. 

In another embodiment, the method can be used to determine the amount of marker polypeptide 
present in a cell, which in turn can be correlated with progression of the disorder, e.g., plaque 
formation. The level of the marker polypeptide can be used predictively to evaluate whether a 
20 sample of cells contains cells which are, or are predisposed towards becoming, plaque associated 
cells. The observation of marker polypeptide level can be utilized in decisions regarding, e.g., the 
use of more stringent therapies. 

As set out above, one aspect of the present invention relates to diagnostic assays for determining, 
in the context of cells isolated from a patient, if the level of a marker polypeptide is significantly 

25 reduced in the sample cells. The term "significantly reduced" refers to a cell phenotype wherein 
the cell possesses a reduced cellular amount of the marker polypeptide relative to a normal cell of 
similar tissue origin. For example, a cell may have less than about 50%, 25%, 10%, or 5% of the 
marker polypeptide that a normal control cell. In particular, the assay evaluates the level of marker 
polypeptide in the test cells, and, preferably, compares the measured level with marker polypeptide 

30 detected in at least one control cell, e.g., a normal cell and/or a transformed cell of known 
phenotype. 

Of particular importance to the subject invention is the ability to quantify the level of marker 
polypeptide as determined by the number of cells associated with a normal or abnormal marker 
polypeptide level. The number of cells with a particular marker polypeptide phenotype may then 
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be correlated with patient prognosis. In one embodiment of the invention, fee marker polypeptide 
phenotype of the lesion is determined as a percentage of cells in a biopsy which are found to have 
abnormally high/low levels of the marker polypeptide. Such expression may be detected by 
inununohistochemical assays, dot-blot assays, ELISA and the like. 

5 Immunohistocheinistrv 

Where tissue samples are employed, immunohistochemical staining may be used to determine the 
number of cells having the marker polypeptide phenotype. For such staining, a multiblock of tissue 
is taken from the biopsy or other tissue sample and subjected to proteolytic hydrolysis, employing 
such agents as protease K or pepsin. In certain embodiments, it may be desirable to isolate a 
10 nuclear fraction from the sample cells and detect the level of the marker polypeptide in the nuclear 
fraction. 

The tissues samples are fixed by treatment with a reagent such as formalin, glutaraldehyde, 
methanol, or the like. The samples are then incubated with an antibody, preferably a monoclonal 
antibody, with binding specificity for the marker polypeptides. This antibody may be conjugated to 

15 a Label for subsequent detection of binding, samples are incubated for a time Sufficient for 
formation of the immunocomplexes. Binding of the antibody is then detected by virtue of a Label 
conjugated to this antibody. Where the antibody is unlabelled, a second labeled antibody may be 
employed, e.g., which is specific for the isotype of the anti-marker polypeptide antibody. Examples 
of labels which may be employed include radionuclides, fluorescence, chemiluminescence, and 

20 en2ymes. 

Where en2ymes are employed, the Substrate for the enzyme may be added to the samples to 
provide a colored or fluorescent product. Examples of suitable enzymes for use in conjugates 
include horseradish peroxidase, alkaline phosphatase, malate dehydrogenase and the like. Where 
not commercially available, such antibody-enzyme conjugates are readily produced by techniques 
25 known to those skilled in the art. 

In one embodiment, the assay is performed as a dot blot assay. The dot blot assay finds particular 
application where tissue samples are employed as it allows determination of the average amount of 
the marker polypeptide associated with a Single cell by correlating the amount of marker 
polypeptide in a cell-free extract produced from a predetermined number of cells. 



30 



hi yet another embodiment, the invention contemplates using one or more antibodies which are 
generated against one or more of the marker polypeptides of this invention, which polypeptides are 
encoded by any of the polynucleotide sequences of the SEQ ID NO: 1 to 26 or 53 to 75. Such a 
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panel of antibodies may be used as a reliable diagnostic probe for breast cancer. The assay of the 
present invention comprises contacting a biopsy sample containing cells, e.g., macrophages, with a 
panel of antibodies to one or more of the encoded products to determine the presence or absence of 
the marker polypeptides. 

5 The diagnostic methods of the subject invention may also be employed as follow-up to treatment, 
e.g., quantification of the level of marker polypeptides may be indicative of the effectiveness of 
current or previously employed therapies for malignant neoplasia and breast cancer in particular as 
well as the effect of these therapies upon patient prognosis. 

The diagnostic assays described above can be adapted to be used as prognostic assays, as well. 

10 Such an application takes advantage of the sensitivity of the assays of the Invention to events 
which take place at characteristic stages in the progression of plaque generation in case of 
malignant neoplasia. For example, a given marker gene may be up- or down-regulated at a very 
early stage, perhaps before the cell is developing into a foam cell, while another marker gene may 
be characteristically up or down regulated only at a much later stage. Such a method could involve 

15 the steps of contacting the mRNA of a test cell with a polynucleotide probe derived from a given 
marker polynucleotide which is expressed at different characteristic levels in breast cancer tissue 
cells at different stages of malignant neoplasia progression, and determining the approximate 
amount of hybridization of the probe to the mRNA of the cell, such amount being an indication of 
the level of expression of the gene in the cell, and thus an indication of the stage of disease 

20 progression of the cell; alternatively, the assay can be carried out with an antibody specific for the 
gene product of the given marker polynucleotide, contacted with the proteins of the test cell. A 
battery of such tests will disclose not only the existence of a certain arteriosclerotic plaque, but 
also will allow the clinician to select the mode of treatment most appropriate for the disease, and to 
predict the likelihood of success of that treatment. 

25 The methods of the invention can also be used to follow the clinical course of a given breast 
cancer predisposition. For example, the assay of the Invention can be applied to a blood sample 
from a patient; following treatment of the patient for BREAST CANCER, another blood sample is 
taken and the test repeated. Successful treatment will result in removal of demonstrate differential 
expression, characteristic of the breast cancer tissue cells, perhaps approaching or even surpassing 

30 normal levels. 

Polypeptide activity 



In one embodiment the present invention provides a method for screening potentially therapeutic 
agents which modulate the activity of one or more "BREAST CANCER GENE" polypeptides, 
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such that if the activity of the polypeptide is increased as a result of the upregulation of the 
"BREAST CANCER GENE" in a subject having or at risk for malignant neoplasia and breast 
cancer in particular, the therapeutic substance will decrease the activity of the polypeptide relative 
to the activity of the some polypeptide in a subject not having or not at risk for malignant neoplasia 
5 or breast cancer in particular but not treated wife the therapeutic agent Likewise, if the activity of 
fee polypeptide as a result of the downregulation of the "BREAST CANCER GENE" is decreased 
in a subject having or at risk for malignant neoplasia or breast cancer in particular, fee therapeutic 
agent will increase the activity of fee polypeptide relative to fee activity of the same polypeptide in 
a subject not having or not at risk for malignant neoplasia or breast cancer in particular, but not 
10 treated with the therapeutic agent. 

The activity of fee "BREAST CANCER GENE" polypeptides indicated in Table 2 or 3 may be 
measured by any means known to those of skill in fee art, and which are particular for fee type of 
activity performed by fee particular polypeptide. Examples of specific assays which may be used 
to measure fee activity of particular polynucleotides are shown below. 

15 a) G protein coupled receptors 

In one embodiment, fee "BREAST CANCER GENE" polynucleotide may encode a G protein 
coupled receptor. In one embodiment, fee present invention provides a method of screening 
potential modulators (inhibitors or activators) of fee G protein coupled receptor by measuring 
changes in fee activity of the receptor in fee presence of a candidate modulator. 

20 

L Gi -coupled receptors 

Cells (such as CHO cells or primary cells) are stably transfected wife fee relevant receptor and 
wife an inducible CRE-luciferase construct. Cells are grown in 50% Dulbecco's modified Eagle 
medium / 50% F12 (DMEM/F12) supplemented wife 10% FBS, at 37°C in a humidified 

25 atmosphere wife 10% C0 2 and are routinely split at a ratio of 1 : 10 every 2 or 3 days. Test cultures 
are seeded into 384 - well plates at an appropriate density (e.g. 2000 cells / well in 35 \xl cell 
culture medium) in DMEM/F12 wife FBS, and are grown for 48 hours (range: -24-60 hours, 
depending on cell line). Growth medium is then exchanged against serum free medium (SFM; e.g. 
Ultra-CHO), containing 0,1% BSA. Test compounds dissolved in DMSO are diluted in SFM and 

30 transferred to fee test cultures (maximal final concentration 10 nmolar), followed by addition of 
forskolin (~ 1 jimolar, final cone.) in SFM + 0,1% BSA 10 minutes later. In case of antagonist 
screening both, an appropriate concentration of agonist, and forskolin are added. The plates are 
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incubated at 37°C in 10% CO2 for 3 hours. Then the supernatant is removed, cells are lysed with 
lysis reagent (25 mmolar phosphate-buffer, pH 7,8, containing 2 mmolar DDT, 10% glycerol and 
3% Triton XI 00). The luciferase reaction is started by addition of substrate-buffer (e.g. luciferase 
assay reagent, Promega) and luminescence is immediately determined (e.g. Berthold luminometer 
5 or Hamamatzu camera system). 

2. G l -coupled receptors 

Cells (such as CHO cells or primary cells) are stably transfected with the relevant receptor and 
with an inducible CRE-luciferase construct Cells are grown in 50% Dulbecco's modified Eagle 
medium / 50% F12 (DMEM/F12) supplemented with 10% FBS, at 37°C in a humidified 

10 atmosphere with 10% C0 2 and are routinely split at a ratio of 1 :10 every 2 or 3 days. Test cultures 
are seeded into 384 - well plates at an appropriate density (e.g. 1000 or 2000 cells / well in 35 |il 
cell culture medium) in DMEM/F12 with FBS, and are grown for 48 hours (range: ~ 24 - 60 hours, 
depending on cell line). The assay is started by addition of test-compounds in serum free medium 
(SFM; e.g. Ultra-CHO) containing 0,1% BSA: Test compounds are dissolved in DMSO, diluted in 

15 SFM and transferred to the test cultures (maximal final concentration 10 ^molar, DMSO cone. < 
0,6 %). Li case of antagonist screening an appropriate concentration of agonist is added 5-10 
minutes later. The plates are incubated at 37°C in 10% CO2 for 3 hours. Then the cells are lysed 
with 10 |il lysis reagent per well (25 mmolar phosphate-buffer, pH 7,8 , containing 2 mmolar DDT, 
10% glycerol and 3% Triton X100) and the luciferase reaction is started by addition of 20 \il 

20 substrate-buffer per well (e.g. luciferase assay reagent, Promega). Measurement of luminescence is 
started immediately (e.g. Berthold luminometer or Hamamatzu camera system). 

3. G 3 -coupled receptors 

Cells (such as CHO cells or primary cells) are stably transfected with the relevant receptor. Cells 
expressing functional receptor protein are grown in 50% Dulbecco's modified Eagle medium / 

25 50% F12 (DMEM/F12) supplemented with 10% FBS, at 37°C in a humidified atmosphere with 
5% C0 2 and are routinely split at a cell line dependent ratio every 3 or 4 days. Test cultures are 
seeded into 384 - well plates at an appropriate density (e.g. 2000 cells / well in 35 jal cell culture 
medium) in DMEM/F12 with FBS, and are grown for 48 hours (range: -24-60 hours, depending 
on cell line). Growth medium is then exchanged against physiological salt solution (e.g. Tyrode 

30 solution). Test compounds dissolved in DMSO are diluted in Tyrode solution containing 0.1% 
BSA and transferred to the test cultures (maximal final concentration 10 mmolar). After addition of 
the receptor specific agonist the resulting Gq-mediated intracellular calcium increase is measured 
using appropriate read-out systems (e.g. calcium-sensitive dyes). 
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b) Ion channels 

Ion channels are integral membrane proteins involved in electrical signaling, transmembrane signal 
transduction, and electrolyte and solute transport. By forming macromolecular pores througji the 
membrane lipid bilayer, ion channels account for the flow of specific ion species driven by the 
5 electrochemical potential gradient for the permeating ion. At the single molecule level, individual 
channels undergo conformational transitions ("gating") between the 'open' (ion conducting) and 
'closed* (non conducting) state. Typical single channel openings last for a few milliseconds and 
result in elementary transmembrane currents in the range of 1CT 9 - 10~ 12 Ampere. Channel gating is 
controlled by various chemical and/or biophysical parameters, such as neurotransmitters and 
10 intracellular second messengers (ligand-gated' channels) or membrane potential (Voltage-gated' 
channels). Ion channels are functionally characterized by their ion selectivity, gating properties, 
and regulation by hormones and pharmacological agents. Because of their central role in signaling 
and transport processes, ion channels present ideal targets for pharmacological therapeutics in 
various pathophysiological settings. 

15 In one embodiment, the "BREAST CANCER GENE" may encode an ion channel. In one 
embodiment, the present invention provides a method of screening potential activators or 
inhibitors of channels activity of the "BREAST CANCER GENE" polypeptide. Screening for 
compounds interaction with ion channels to either inhibit or promote their activity can be based on 
(1.) binding and (2.) functional assays in living cellsf Hille (183)]. 

20 1. For ligand-gated channels, e.g. ionotropic neurotransmitter/hormone receptors, assays can 
be designed detecting binding to the target by competition between the compound and a 
labeled ligand. 



2. Ion channel function can be tested functionally in living cells. Target proteins are either 
25 expressed endogenously in appropriate reporter cells or are introduced recombinantly. 

Channel activity can be monitored by (2.1) concentration changes of the permeating ion 
(most prominently Ca 2+ ions), (2.2) by changes in the transmembrane electrical potential 
gradient, and (2.3) by measuring a cellular response (e.g. expression of a reporter gene, 
secretion of a neurotransmitter) triggered or modulated by the target activity. 

30 2.1 Chahnelactivity~results iiTtran^embri^ ion fluxes. Thus" activation of ionic channels" 
can be monitored by the resulting changes in intracellular ion concentrations using 
luminescent or fluorescent indicators. Because of its wide dynamic range and availability 
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of suitable indicators this applies particularly to changes in intracellular Ca 2 * ion 
concentration ([Ca 2+ ]0- [Ca 24 ]j can be measured, for example, by aequorin luminescence or 
fluorescence dye technology (e.g. using Fluo-3, Indo-1, Fura-2). Cellular assays can be 
designed where either the Ca 2+ flux through the target channel itself is measured directly or 
5 where modulation of the target channel affects membrane potential and thereby the activity 

of co-expressed voltage-gated Ca 2+ channels. 

2.2 Ion channel currents result in changes of electrical membrane potential (V^ which can be 
monitored directly using potentiometric fluorescent probes. These electrically charged 
indicators (e.g. the anionic oxonol dye DiBAQ(3)) redistribute between extra- and 
10 intracellular compartment in response to voltage changes. The equilibrium distribution is 

governed by the Nernst-equation. Thus changes in membrane potential results in 
concomitant changes in cellular fluorescence. Again, changes in V m might be caused 
directly by the activity of the target ion channel or througjh amplification and/or 
prolongation of the signal by channels co-expressed in the same cell. 

15 2.3 Target channel activity can cause cellular Ca 2+ entry either directly or through activation of 
additional Ca 2+ channel (see 2.1). The resulting intracellular Ca 2+ signals regulate a variety 
of cellular responses, e.g. secretion or gene transcription. Therefore modulation of the 
target channel can be detected by monitoring secretion of a known hormone/transmitter 
from the target-expressing cell or through expression of a reporter gene (e.g. luciferase) 

20 controlled by an Ca 2+ -responsive promoter element (e.g. cyclic AMP/ Ca 2+ -responsive 

elements; CRE). 

c) DNA-binding proteins and transcription factors 

In one embodiment, the "BREAST CANCER GENE" may encode a DNA-binding protein or a 
transcription factor. The activity of such a DNA-binding protein or a transcription factor may be 

25 measured, for example, by a promoter assay which measures the ability of the DNA-binding 
protein or the transcription factor to initiate transcription of a test sequence linked to a particular 
promoter. In one embodiment, the present invention provides a method of screening test 
compounds for its ability to modulate the activity of such a DNA-binding protein or a transcription 
factor by measuring the changes in the expression of a test gene which is regulated by a promoter 

30 which is responsive to the transcription factor. 
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A promoter assay was set up with a human hepatocellular carcinoma cell HepG2 that was stably 
transfected with a luciferase gene under the control of a gene of interest (e.g. thyroid hormone) 
regulated promoter. The vector 2xIR01uc, which was used for transfection, carries a thyroid 
hormone responsive element (TRE) of two 12 bp inverted palindromes separated by an 8 bp spacer 
in front of a tk minimal promoter and the luciferase gene. Test cultures were seeded in 96 well 
plates in serum - free Eagle's Minimal Essential Medium supplemented with glutamine, tricine, 
sodium pyruvate, non - essential amino acids, insulin, selen, transferrin, and were cultivated in a 
humidified atmosphere at 10 % C0 2 at 37°C. After 48 hours of incubation serial dilutions of test 
compounds or reference compounds (L-T3, L-T4 e.g.) and co-stimulator if appropriate (final 
concentration 1 nM) were added to the cell cultures and incubation was continued for the optimal 
time (e.g. another 4-72 hours). The cells were then lysed by addition of buffer containing Triton 
XI 00 and luciferin and the luminescence of luciferase induced by T3 or other compounds was 
measured in a luminometer. For each concentration of a test compound replicates of 4 were tested. 
EC 5 o - values for each test compound were calculated by use of the Graph Pad Prism Scientific 
software. 

Screening Methods 

The invention provides assays for screening test compounds which bind to or modulate the activity 
of a ,J3REAST CANCER GENE" polypeptide or a ,3REAST CANCER GENE" polynucleotide. 
A test compound preferably binds to a „BREAST CANCER GENE" polypeptide or 
polynucleotide. More preferably, a test compound decreases or increases „BREAST CANCER 
GENE" activity by at least about 10, preferably about 50, more preferably about 75, 90, or 100% 
relative to the absence of the test compound. 

Test Compounds 

Test compounds can be pharmacological agents already known in the art or can be compounds 
previously unknown to have any pharmacological activity. The compounds can be naturally 
occurring or designed in the laboratory. They can be isolated from microorganisms, animals, or 
plants, and can be produced recombinant, or synthesised by chemical methods known in the art. If 
desired, test compounds can be obtained using any of the numerous combinatorial library methods 
known in the art including Inrt not limited to, biological libraries, spatially addressable parallel 
solid phase or solution phase libraries, synthetic library methods requiring deconvolution, the one- 
bead one-compound library method, and synthetic library methods using affinity chromatography 
selection. The biological library approach is limited to polypeptide libraries, while the other four 



WO 2005/047534 PCT/EP2004/011599 

-99- 

approaches are applicable to polypeptide, non-peptide oligomer, or small molecule libraries of 
compounds. [For review see Lam, 1997, (151)]. 

Methods for the synthesis of molecular libraries are well known in the art [see, for example, 
DeWitt et al., 1993, (152); Erb et al., 1994, (153); Zuckermaim et al., 1994, (154); Cho et al., 1993, 
5 (155); Carell et al., 1994, (156) and Gallop et al., 1994, (157). Libraries of compounds can be 
presented in solution [see, e.g., Houghten, 1992, (158)], or on beads [Lam, 1991, (159)], DNA- 
chips [Fodor, 1993, (160)], bacteria or spores (Ladner, U.S. Patent 5,223,409), plasnrids [Cull et 
al., 1992, (161)], or phage [Scott & Smith, 1990, (162); Devlin, 1990, (163); Cwirla et al., 1990, 
(164); Felici, 1991, (165)]. 

10 High Throughput Screening 

Test compounds can be screened for the ability to bind to ,3REAST CANCER GENE" 
polypeptides or polynucleotides or to affect „BREAST CANCER GENE" activity or ,3REAST 
CANCER GENE" expression using high throughput screening. Using high throughput screening, 
many discrete compounds can be tested in parallel so that large numbers of test compounds can be 
15 quickly screened. The most widely established techniques utilize 96-well, 384-well or 1536-well 
microtiter plates. The wells of the microtiter plates typically require assay volumes that range from 
5 to 500 In addition to the plates, many instruments, materials, pipettors, robotics, plate 
washers, and plate readers are commercially available to fit the microwell formats. 

Alternatively, free format assays, or assays that have no physical barrier between samples, can be 
20 used. For example, an assay using pigment cells (melanocytes) in a simple homogeneous assay for 
combinatorial peptide libraries is described by Jayawickreme et al., (166). The cells are placed 
under agarose in culture dishes, then beads that carry combinatorial compounds are placed on the 
surface of the agarose. The combinatorial compounds are partially released the compounds from 
, the beads. Active compounds can be visualised as dark pigment areas because, as the compounds 
25 diffuse locally into the gel matrix, the active compounds cause the cells to change colors. 

Another example of a free format assay is described by Chelsky, (167). Chelsky placed a simple 
homogenous enzyme assay for carbonic anhydrase inside an agarose gel such that the enzyme in 
the gel would cause a color change throughout the gel. Thereafter, beads carrying combinatorial 
compounds via a photolinker were placed inside the gel and the compounds were partially released 
30 by UV light Compounds that inhibited the enzyme were observed as local zones of inhibition 
having less color change. 
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In another example, combinatorial libraries were screened for compounds that had cytotoxic 
effects on cancer cells growing in agar [Salmon et al., 1996, (168)]. 

Another high throughput screening method is described in Beutel et al., U.S. Patent 5,976,813, In 
this method, test samples are placed in a porous matrix. One or more assay components are then 
5 placed within, on top of, or at the bottom of a matrix such as a gel, a plastic sheet, a filter, or other 
form of easily manipulated solid support. When samples are introduced to the porous matrix they 
diffuse sufficiently slowly, such that the assays can be performed without the test samples running 
together. 

Binding Assays 

10 For binding assays, the test compound is preferably a small molecule which binds to and occupies, 
for example, fee ATP/GTP binding site of the enzyme or the active site of a „BREAST CANCER 
GENE" polypeptide, such that normal biological activity is prevented. Examples of such small 
molecules include, but are not limited to, small peptides or peptide-like molecules. 

In binding assays, either the test compound or a JBREAST CANCER GENE" polypeptide can 
15 comprise a detectable label, such as a fluorescent, radioisotopic, chemiluminescent, or enzymatic 
label, such as horseradish peroxidase, alkaline phosphatase, or luciferase. Detection of a test 
compound which is bound to a ,3REAST CANCER GENE" polypeptide can then be 
accomplished, for example, by direct counting of radioemmission, by scintillation counting, or by 
determining conversion of an appropriate substrate to a detectable product. 

20 Alternatively, binding of a test compound to a „BREAST CANCER GENE" polypeptide can be 
determined without labeling either of the interactants. For example, a microphysiometer can be 
used to detect binding of a test compound with a „BREAST CANCER GENE" polypeptide. A 
microphysiometer (e.g., CytosensorJ) is an analytical instrument that measures the rate at which a 
cell acidifies its environment using a light-addressable potentiometric sensor (LAPS). Changes in 

25 this acidification rate can be used as an indicator of the interaction between a test compound and a 
,3REAST CANCER GENE" polypeptide [McConnell et al, 1992, (169)]. 

Determining the ability of a test compound to bind to a JBREAST CANCER GENE" polypeptide 
also can be accomplished using a technology such as real-time Bimolecular Interaction Analysis 
(BIA) [Sjolander & Urbaniczky, 1991, (170), and Szabo et al, 1995, (171)]. BIA is a technology 
30 for studying biospecific interactions in" real time," without labeling" any" of the interactants (e.g., 
BIAcore™). Changes in the optical phenomenon surface plasmon resonance (SPR) can be used as 
an indication of real-time reactions between biological molecules. 
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In yet another aspect of the invention, a ,3REAST CANCER GENE" polypeptide can be used as a 
"bait protein" in a two-hybrid assay or three-hybrid assay [see, e.g., U.S. Patent 5,283,317; Zervos 
et al., 1993, (172); Madura et al., 1993, (173); Bartel et al., 1993, (174); Iwabuchi et ah, 1993, 
(175) and Brent WO 94/10300], to identify other proteins which bind to or interact with the 
5 „BREAST CANCER GENE" polypeptide and modulate its activity. 

The two-hybrid system is based on the modular nature of most transcription factors, which consist 
of separable DNA-binding and activation domains. Briefly, the assay utilizes two different DNA 
constructs. For example, in one construct, polynucleotide encoding a „BREAST CANCER GENE" 
polypeptide can be fused to a polynucleotide encoding the DNA binding domain of a known 

10 transcription factor (e.g., GAL4). In the other construct a DNA sequence that encodes an 
unidentified protein ("prey" or "sample") can be fused to a polynucleotide that codes for the 
activation domain of the known transcription factor. If the "bait" and the "prey" proteins are able 
to interact in vivo to form an protein- dependent complex, the DNA-binding and activation 
domains of the transcription factor are brought into close proximity. This proximity allows 

15 transcription of a reporter gene (e.g., LacZ), which is operably linked to a transcriptional 
regulatory site responsive to the transcription factor. Expression of the reporter gene can be 
detected, and cell colonies containing the functional transcription factor can be isolated and used 
to obtain the DNA sequence encoding the protein which interacts with the ,3REAST CANCER 
GENE" polypeptide. 

20 It may be desirable to immobilize either a ,3REAST CANCER GENE" polypeptide (or 
polynucleotide) or the test compound to facilitate separation of bound, from unbound forms of one 
or both of the interactants, as well as to accommodate automation of the assay. Thus, either a 
„BREAST CANCER GENE" polypeptide (or polynucleotide) or the test compound can be bound 
to a solid support. Suitable solid supports include, but are not limited to, glass or plastic slides, 

25 tissue culture plates, microtiter wells, tubes, silicon chips, or particles such as beads (including, but 
not limited to, latex, polystyrene, or glass beads). Any method known in the art can be used to 
attach a „BREAST CANCER GENE" polypeptide (or polynucleotide) or test compound to a solid 
support, including use of covalent and non-covalent linkages, passive absorption, or pairs of 
binding moieties attached respectively to the polypeptide (or polynucleotide) or test compound and 

30 the solid support Test compounds are preferably bound to the solid support in an array, so that the 
location of individual test compounds can be tracked. Binding of a test compound to a „BREAST 
CANCER GENE" polypeptide (or polynucleotide) can be accomplished in any vessel suitable for 
containing the reactants. Examples of such vessels include microliter plates, test tubes, and 
microcentrifuge tubes. 
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In one embodiment, a „BREAST CANCER GENE" polypeptide is a fusion protein comprising a 
domain that allows the „BREAST CANCER GENE" polypeptide to be bound to a solid support. 
For example, glutathione S-transferase fusion proteins can be adsorbed onto glutathione sepharose 
beads (Sigma Chemical, St Louis, Mo.) or glutathione derivatized microliter plates, which are 
5 then combined with the test compound or the test compound and the nonadsorbed „BREAST 
CANCER GENE" polypeptide; the mixture is then incubated under conditions conducive to 
complex formation (e.g., at physiological conditions for salt and pH). Following incubation, the 
beads or microtiter plate wells are washed to remove any unbound components. Binding of the 
interactants can be determined either directly or indirectly, as described above. Alternatively, the 
10 complexes can be dissociated from fee solid support before binding is determined. 

Other techniques for immobilising proteins or polynucleotides on a solid support also can be used 
in the screening assays of the invention. For example, either a „BREAST CANCER GENE" 
polypeptide (or polynucleotide) or a test compound can be immobilized utilizing conjugation of 
biotin and streptavidin. Biotinylated „BREAST CANCER GENE" polypeptides (or 

15 polynucleotides) or test compounds can be prepared from biotin NHS (N-hydroxysuccinimide) 
using techniques well known in the art (e.g., biotinylation kit, Pierce Chemicals, Rockford, 111.) 
and immobilized in the wells of streptavidin-coated 96 well plates (Pierce Chemical). 
Alternatively, antibodies which specifically bind to a „BREAST CANCER GENE" polypeptide, 
polynucleotide, or a test compound, but which do not interfere with a desired binding site, such as 

20 the ATP/GTP binding site or the active site of the „BREAST CANCER GENE" polypeptide, can 
be derivatised to the wells of the plate. Unbound target or protein can be trapped in the wells by 
antibody conjugation. 

Methods for detecting such complexes, in addition to those described above for the GST- 
immobilized complexes, include immunodetection of complexes using antibodies which 
25 specifically bind to a „BREAST CANCER GENE" polypeptide or test compound, enzyme-linked 
assays which rely on detecting an activity of a „BREAST CANCER GENE" polypeptide, and SDS 
gel electrophoresis under non-reducing conditions. 

Screening for test compounds which bind to a „BREAST CANCER GENE" polypeptide or 
polynucleotide also can be carried out in an intact cell. Any cell which comprises a ,3REAST 
30 CANCER GENE" polypeptide or polynucleotide can be used in a cell-based assay system. A 
„BREAST CANCER GENE" polynucleotide can be naturally occurring in the cell or can be 
introduced using techniques such as those described above. Binding of the test compound to a 
„BREAST CANCER GENE" polypeptide or polynucleotide is determined as described above. 
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In another embodiment, test compounds which increase or decrease „BREAST CANCER GENE" 
expression are identified. A „BREAST CANCER GENE" polynucleotide is contacted with a test 
compound, and the expression of an RNA or polypeptide product of the „BREAST CANCER 
5 GENE" polynucleotide is determined. The level of expression of appropriate mRNA or poly- 
peptide in the presence of the test compound is compared to the level of expression of mRNA or 
polypeptide in the absence of the test compound. The test compound can then be identified as a 
modulator of expression based on this comparison. For example, when expression of mRNA or 
polypeptide is greater in the presence of the test compound than in its absence, the test compound 
10 is identified as a stimulator or enhancer of the mRNA or polypeptide expression. Alternatively, 
when expression of the mRNA or polypeptide is less in the presence of the test compound than in 
its absence, the test compound is identified as an inhibitor of the mRNA or polypeptide expression. 

The level of „BREAST CANCER GENE" mRNA or polypeptide expression in the cells can be 
determined by methods well known in the art for detecting mRNA or polypeptide. Either 

15 qualitative or quantitative methods can be used. The presence of polypeptide products of a 
„BREAST CANCER GENE" polynucleotide can be determined, for example, using a variety of 
techniques known in the art, including immunochemical methods such as radioimmunoassay, 
Western blotting, and immunohistochemistry. Alternatively, polypeptide synthesis can be 
determined in vivo, in a cell culture, or in an in vitro translation system by detecting incorporation 

20 of labeled amino acids into a „BREAST CANCER GENE" polypeptide. 

Such screening can be carried out either in a cell-free assay system or in an intact cell. Any cell 
which expresses a „BREAST CANCER GENE" polynucleotide can be used in a cell-based assay 
system. A ,JBREAST CANCER GENE" polynucleotide can be naturally occurring in the cell or 
can be introduced using techniques such as those described above. Either a primary culture or an 
25 established cell line, such as CHO or human embryonic kidney 293 cells, can be used. 

Tlierapeutic Indications and Methods 

Therapies for treatment of breast cancer primarily relied upon effective chemotherapeutic drugs 
for intervention on the cell proliferation, cell growth or angiogenesis. The advent of genomics- 
driven molecular target identification has opened up the possibility of identifying new breast 
30 cancer-specific targets for therapeutic intervention that will provide safer, more effective 
treatments for malignant neoplasia patients and breast cancer patients in particular. Thus, newly 
discovered breast cancer-associated genes and their products can be used as tools to develop 
innovative therapies. The identification of the Her2/neu receptor kinase presents exciting new 



WO 2005/047534 PCT7EP2004/0 11599 

-104- 

opportunities for treatment of a certain subset of tumor patients as described before. Genes playing 
important roles in any of the physiological processes outlined above can be characterized as breast 
cancer targets. Genes or gene fragments identified through genomics can readily be expressed in 
one or more heterologous expression systems to produce functional recombinant proteins. These 
5 proteins are characterized in vitro for their biochemical properties and then used as tools in high- 
throughput molecular screening programs to identify chemical modulators of their biochemical 
activities. Modulators of target gene expression or protein activity can be identified in this maimer 
and subsequently tested in cellular and in vivo disease models for therapeutic activity. 
Optimization of lead compounds with iterative testing in biological models and detailed 
10 pharmacokinetic and toxicological analyses form the basis for drug development and subsequent 
testing in humans. 

This invention further pertains to the use of novel agents identified by the screening assays 
described above. Accordingly, it is within the scope of this invention to use a test compound 
identified as described herein in an appropriate animal model. For example, an agent identified as 

15 described herein (e.g., a modulating agent, an antisense polynucleotide molecule, a specific 
antibody, ribozyme, or a human ,3&EAST CANCER GENE" polypeptide binding molecule) can 
be used in an animal model to determine the efficacy, toxicity, or side effects of treatment with 
such an agent. Alternatively, an agent identified as described herein can be used in an animal 
model to determine the mechanism of action of such an agent. Furthermore, this invention pertains 

20 to uses of novel agents identified by the above described screening assays for treatments as 
described herein. 

A reagent which affects human „BKEAST CANCER GENE" activity can be administered to a 
human cell, either in vitro or in vivo, to reduce or increase human „BREAST CANCER GENE" 
activity. The reagent preferably binds to an expression product of a human „BREAST CANCER 
25 GENE". If the expression product is a protein, the reagent is preferably an antibody. For treatment 
of human cells ex vivo, an antibody can be added to a preparation of stem cells which have been 
removed from the body. The cells can then be replaced in the same or another human body, with or 
without clonal propagation, as is known in the art. 

In one embodiment, the reagent is delivered using a liposome. Preferably, the liposome is stable in 
30 the animal into which it has been administered for at least about 30 minutes, more preferably for at 
least about 1 hour, and even more preferably for at least about 24 hours. A liposome comprises a 
lipid composition that is capable - of targeting ~a~ reagent," particularly a polynucleotide, to a 
particular site in an animal, such as a human. Preferably, the lipid composition of the liposome is 
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capable of targeting to a specific organ of an animal, such as the lung, liver, spleen, heart brain, 
lymph nodes, and skin. 

A liposome useful in the present invention comprises a lipid composition that is capable of fusing 
with the plasma membrane of the targeted cell to deliver its contents to the cell. Preferably, the 
5 transfection efficiency of a liposome is about 0.5 ug of DNA per 16 nmol of liposome delivered to 
about 10 6 cells, more preferably about 1.0 ug of DNA per 16 nmol of liposome delivered to about 
10 6 cells, and even more preferably about 2.0 fig of DNA per 16 nmol of liposome delivered to 
about 10 6 cells. Preferably, a liposome is between about 100 and 500 nm, more preferably between 
about 150 and 450 nm, and even more preferably between about 200 and 400 nm in diameter. 

10 Suitable liposomes for use in the present invention include those liposomes usually used in, for 
example, gene delivery methods known to those of skill in the art. More preferred liposomes 
include liposomes having a polycationic lipid composition and/or liposomes having a cholesterol 
backbone conjugated to polyethylene glycol. Optionally, a liposome comprises a compound 
capable of targeting the liposome to a particular cell type, such as a cell-specific ligand exposed on 

15 the outer surface of the liposome. 

Complexing a liposome with a reagent such as an antisense oligonucleotide or ribozyme can be 
achieved using methods which are standard in the art (see, for example, U.S. Patent 5,705,151). 
Preferably, from about 0.1 ug to about 10 ug of polynucleotide is combined with about 8 nmol of 
liposomes, more preferably from about 0.5 ug to about 5 ug of polynucleotides are combined with 
20 about 8 nmol liposomes, and even more preferably about 1.0 ug of polynucleotides is combined 
with about 8 nmol liposomes. 

In another embodiment, antibodies can be delivered to specific tissues in vivo using receptor- 
mediated targeted delivery. Receptor-mediated DNA delivery techniques are taught in, for 
example, Findeis et al., 1993, (176); Chiou et al., 1994, (177); Wu & Wu, 1988, (178); Wu et al., 
25 1994, (179); Zenke et al., 1990, (180); Wu et al., 1991, (181). 

Determination of a Therapeutically Effective Dose 

The determination of a therapeutically effective dose is well within the capability of those skilled 
in the art. A therapeutically effective dose refers to that amount of active ingredient which 
increases or decreases human ,3REAST CANCER GENE" activity relative to the human 
30 „BREAST CANCER GENE" activity which occurs in the absence of the therapeutically effective 
dose. 
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For any compound, the therapeutically effective dose can be estimated initially either in cell 
culture assays or in animal models, usually mice, rabbits, dogs, or pigs. The animal model also can 
be used to determine the appropriate concentration range and route of administration. Such 
information can then be used to determine useful doses and routes for administration in humans. 

5 Therapeutic efficacy and toxicity, e.g., ED 50 (the dose therapeutically effective in 50% of the 
population) and LD 50 (the dose lethal to 50% of the population), can be determined by standard 
pharmaceutical procedures in cell cultures or experimental animals. The dose ratio of toxic to 
therapeutic effects is the therapeutic index, and it can be expressed as the ratio, LD 50 /ED 5 o. 

Pharmaceutical compositions which exhibit large therapeutic indices are preferred. The data 
10 obtained from cell culture assays and animal studies is used in formulating a range of dosage for 
human use. The dosage contained in such compositions is preferably within a range of circulating 
concentrations that include the ED 5 o with little or no toxicity. The dosage varies within this range 
depending upon the dosage form employed, sensitivity of the patient, and the route of 
administration. 

15 The exact dosage will be determined by the practitioner, in light of factors related to the subject 
that requires treatment. Dosage and administration are adjusted to provide sufficient levels of the 
active ingredient or to maintain the desired effect. Factors which can be taken into account include 
the severity of the disease state, general health of the subject, age, weight, and gender of the 
subject, diet, time and frequency of administration, drug combination(s), reaction sensitivities, and 

20 tolerance/response to therapy. Long-acting pharmaceutical compositions can be administered every 
3 to 4 days, every week, or once every two weeks depending on the half-life and clearance rate of 
the particular formulation. 

Normal dosage amounts can vary from 0.1 to 100,000 micrograms, up to a total dose of about 1 g, 
depending upon the route of administration. Guidance as to particular dosages and methods of 
25 delivery is provided in the literature and generally available to practitioners in the art. Those 
skilled in the art will employ different formulations for nucleotides than for proteins or their 
inhibitors. Similarly, delivery of polynucleotides or polypeptides will be specific to particular 
cells, conditions, locations, etc. 

If the reagent is a single-chain antibody, polynucleotides encoding the antibody can be constructed 
30 and introduced into a cell either ex vivo or in vivo using well-established techniques including, but 
not limited to, transfenin-polycation-mediated DNA transfer, transfection with naked or 
encapsulated nucleic acids, liposome-mediated cellular fusion, intracellular transportation of 



WO 2005/047534 PCT7EP2004/011599 

-107- 

DNA-coated latex beads, protoplast fusion, viral infection, electroporation, a gene gun, and 
DEAE- or calcium phosphate-mediated transfection. 

Effective in vivo dosages of an antibody are in the range of about 5 \xg to about 50 ng/kg, about 
50 ng to about 5 mg/kg, about 100 jig to about 500 fig/kg of patient, body weight, and about 200 to 
5 about 250 \ig/kg of patient body weight. For administration of polynucleotides encoding single- 
chain antibodies, effective in vivo dosages are in the range of about 100 ng to about 200 ng, 500 ng 
to about 50 mg, about 1 \xg to about 2 mg, about 5 jig to about 500 ng, and about 20 jig to about 
100 \xg ofDNA. 

If the expression product is mRNA, the reagent is preferably an antisense oligonucleotide or a 
10 ribozyme. Polynucleotides which express antisense oligonucleotides or ribozymes can be 
introduced into cells by a variety of methods, as described above. 

Preferably, a reagent reduces expression of a ,3REAST CANCER GENE" gene or the activity of 
a "BREAST CANCER GENE" polypeptide by at least about 10, preferably about 50, more 
preferably about 75, 90, or 100% relative to the absence of the reagent The effectiveness of the 
15 mechanism chosen to decrease the level of expression of a ,JBREAST CANCER GENE" gene or 
the activity of a „BREAST CANCER GENE" polypeptide can be assessed using methods well 
known in the art, such as hybridization of nucleotide probes to ,3REAST CANCER GENE"- 
specific mRNA, quantitative RT-PCR, immunologic detection of a „BREAST CANCER GENE" 
polypeptide, or measurement of „BREAST CANCER GENE" activity. 

20 In any of the embodiments described above, any of the pharmaceutical compositions of the 
invention can be administered in combination with other appropriate therapeutic agents. Selection 
of the appropriate agents for use in combination therapy can be made by one of ordinary skill in 
the art, according to conventional pharmaceutical principles. The combination of therapeutic 
agents can act synergistically to effect the treatment or prevention of the various disorders 

25 described above. Using this approach, one may be able to achieve therapeutic efficacy with lower 
dosages of each agent, thus reducing the potential for adverse side effects. 

Any of the therapeutic methods described above can be applied to any subject in need of such 
therapy, including, for example, birds and mammals such as dogs, cats, cows, pigs, sheep, goats, 
horses, rabbits, monkeys, and most preferably, humans. 



~ 30 All patents and patent applications cited in this disclosure are expressly incorporated herein by" 
reference. The above disclosure generally describes the present invention. A more complete 
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understanding can be obtained by reference to the following specific examples which are provided 
for purposes of illustration only and are not intended to limit the scope of the invention. 

Pharmaceutical Compositions 

The invention also provides pharmaceutical compositions which can be administered to a patient 
5 to achieve a therapeutic effect Pharmaceutical compositions of the invention can comprise, for 
example, a „BREAST CANCER GENE" polypeptide, „BREAST CANCER GENE" polynucleo- 
tide, ribozymes or antisense oligonucleotides, antibodies which specifically bind to a „BREAST 
CANCER GENE" polypeptide, or mimetics, agonists, antagonists, or inhibitors of a „BREAST 
CANCER GENE" polypeptide activity. The compositions can be administered alone or in 
10 combination with at least one other agent, such as stabilizing compound, which can be 
administered in any sterile, biocompatible pharmaceutical carrier, including, but not limited to, 
saline, buffered saline, dextrose, and water. The compositions can be administered to a patient 
alone, or in combination with other agents, drugs or hormones. 

In addition to the active ingredients, these pharmaceutical compositions can contain suitable 
15 pharmaceutically acceptable carriers comprising excipients and auxiliaries which facilitate 
processing of the active compounds into preparations which can be used pharmaceutically. 
Pharmaceutical compositions of the invention can be administered by any number of routes 
including, but not limited to, oral, intravenous, intramuscular, intraarterial, intramedullary, 
intrathecal, intraventricular, transdermal, subcutaneous, intraperitoneal, intranasal, parenteral, 
20 topical, sublingual, or rectal means. Pharmaceutical compositions for oral administration can be 
formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for 
oral administration. Such carriers enable the pharmaceutical compositions to be formulated as 
tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions, and the like, for 
ingestion by the patient. 

25 Pharmaceutical preparations for oral use can be obtained through combination of active 
compounds with solid excipient, optionally grinding a resulting mixture, and processing the 
mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores, 
suitable excipients are carbohydrate or protein fillers, such as sugars, including lactose, sucrose, 
maimitol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose, such as 

30 methyl cellulose, hydroxypropylmethylcellulose, or sodium carboxymethylcellulose; gums 
including arabic and- tragacanth;- and- proteins such- as- gelatin and- collagen. If desired,- 
disintegrating or solubilizing agents can be added, such as the cross-linked polyvinyl pyrrolidone, 
agar, alginic acid, or a salt thereof, such as sodium alginate. 
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Dragee cores can be used in conjunction with suitable coatings, such as concentrated sugar 
solutions, which also can contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, 
polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or 
solvent mixtures. Dyestuffs or pigments can be added to the tablets or dragee coatings for product 
5 identification or to characterize the quantity of active compound, i.e., dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of gelatin, as 
well as soft, sealed capsules made of gelatin and a coating, such as glycerol or sorbitol. Push-fit 
capsules can contain active ingredients mixed with a filler or binders, such as lactose or starches, 
lubricants, such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the 
10 active compounds can be dissolved or suspended in suitable liquids, such as fatty oils, liquid, or 
liquid polyethylene glycol with or without stabilizers. 

Pharmaceutical formulations suitable for parenteral administration can be formulated in aqueous 
solutions, preferably in physiologically compatible buffers such as Hanks' solution, Ringer's 
solution, or physiologically buffered saline. Aqueous injection suspensions can contain substances 

15 which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol, 
or dextran. Additionally, suspensions of the active compounds can be prepared as appropriate oily 
injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, 
or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Non-lipid 
polycationic amino polymers also can be used for delivery. Optionally, the suspension also can 

20 contain suitable stabilizers or agents which increase the solubility of the compounds to allow for 
the preparation of highly concentrated solutions. For topical or nasal administration, penetrants 
appropriate to the particular barrier to be permeated are used in the formulation. Such penetrants 
are generally known in the art. 

The pharmaceutical compositions of the present invention can be manufactured in a manner that is 
25 known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee making, 
levigating, emulsifying, encapsulating, entrapping, or lyophilizing processes. The pharmaceutical 
composition can be provided as a salt and can be formed with many acids, including but not 
limited to, hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc. Salts tend to be more 
soluble in aqueous or other protonic solvents than are the corresponding free base fonns. In other 
30 cases, the preferred preparation can be a lyophilized powder which can contain any or all of the 
following: 150 mM histidine, 0.1%2% sucrose, and 27% mannitol, at a pH range of 4.5 to 5.5, that 
is combined with buffer prior to use. 

Further details on techniques for formulation and administration can be found in the latest edition 
of REMINGTON'S PHARMACEUTICAL SCIENCES (182). After pharmaceutical compositions have 
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been prepared, they can be placed in an appropriate container and labeled for treatment of an 
indicated condition. Such labeling would include amount, frequency, and method of 
administration. 

MATERIAL AND METHODS 

5 One strategy for identifying genes that are involved in breast cancer is to detect genes that are 
expressed differentially under conditions associated with the disease versus non-disease 
conditions. The sub-sections below describe a number of experimental systems which may be used 
to detect such differentially expressed genes. Li general, these experimental systems include at 
least one experimental condition in which subjects or samples are treated in a manner associated 
10 with breast cancer, in addition to at least one experimental control condition lacking such disease 
associated treatment. Differentially expressed genes are detected, as described below, by 
comparing the pattern of gene expression between the experimental and control conditions. 

Once a particular gene has been identified through the use of one such experiment, its expression 
pattern may be further characterized by studying its expression in a different experiment and the 
15 findings may be validated by an independent technique. Such use of multiple experiments may be 
useful in distinguishing the roles and relative importance of particular genes in breast cancer. A 
combined approach, comparing gene expression pattern in cells derived from breast cancer patients 
to those of in vitro cell culture models can give substantial hints on the pathways involved in 
development and/or progression of breast cancer. 

20 Among the experiments which may be utilized for the identification of differentially expressed 
genes involved in malignant neoplasia and breast cancer, for example, are experiments designed to 
analyze those genes which are involved in signal transduction. Such experiments may serve to 
identify genes involved in the proliferation of cells. 

Below are methods described for the identification of genes which are involved in breast cancer. 
25 Such represent genes which are differentially expressed in breast cancer conditions relative to their 
expression in normal, or non-breast cancer conditions or upon experimental manipulation based on 
clinical observations. Such differentially expressed genes represent "target" and/or "marker" genes. 
Methods for the further characterization of such differentially expressed genes, and for their 
identification as target and/or marker genes, are presented below. 

30 ~ Alternatively, a~ differentially "expressed gene may" have"" its" expression" modulated, ~ i.e., 
quantitatively increased or decreased, in normal versus breast cancer states, or under control versus 
experimental conditions. The degree to which expression differs in normal versus breast cancer or 
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control versus experimental states need only be large enough to be visualized via standard 
characterization techniques, such as, for example, the differential display technique described 
below. Other such standard characterization techniques by which expression differences may be 
visualized include but are not limited to quantitative RT-PCR and Northern analyses, which are 
5 well known to those of skill in the art 

As part of this invention, a method is described by way of illustration and not by limitation, 
displaying at least some of the below mentioned aspects: 

1. A method for the prediction, diagnosis or prognosis of malignant neoplasia by the detection 
of at least 2 markers characterized in that the markers are genes and fragments thereof or 

10 genomic nucleic acid sequences that are located on one chromosomal region which is altered 

in malignant neoplasia. 

2. A method for the prediction, diagnosis or prognosis of malignant neoplasia by the detection 
of at least 2 markers characterized in that the markers are: 

a) genes that are located on one or more chromosomal region(s) which is/are altered 
15 in malignant neoplasia; and 

b) 

i) receptor and ligand; or 

ii) members of the same signal transduction pathway; or 

iii) members of synergistic signal transduction pathways; or 
20 iv) members of antagonistic signal transduction pathways; or 

v) transcription factor and transcription factor binding site. 

3. The method of aspect 1 or 2 wherein the malignant neoplasia is breast cancer, ovarian 
cancer, gastric cancer, colon cancer, esophageal cancer, mesenchymal cancer, bladder 
cancer or non-small cell lung cancer. 



25 4. 



The method of aspect 1 or 2 wherein at least one chromosomal region is defined as the 
cytogenetic region: lpl3, lq32, 3p21-p24, 5pl3-pl4, 8q23-q24, llql3, 12ql3,17ql2-q24 
or20q!3. 
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5. The method of aspect 1 or 2 wherein at least chromosomal region is defined as the 
cytogenetic region 17ql 1.2-21.3 and the malignant neoplasia is breast cancer, ovarian 
cancer, gastric cancer, colon cancer, esophageal cancer, mesenchymal cancer, bladder 
cancer or non-small cell lung cancer. 

5 6. The method of aspect 1 or 2 wherein at least one chromosomal region is defined as the 
cytogenetic region 3p21-24 and the malignant neoplasia is breast cancer, ovarian cancer, 
gastric cancer, colon cancer, esophageal cancer, mesenchymal cancer, bladder cancer or 
non-small cell lung cancer. 

7. The method of aspect 1 or 2 wherein at least one chromosomal region is defined as the 
10 cytogenetic region 12ql3 and the malignant neoplasia is breast cancer, ovarian cancer, 

gastric cancer, colon cancer, esophageal cancer, mesenchymal cancer, bladder cancer or 
non-small cell lung cancer. 

8. A method for the prediction, diagnosis or prognosis of malignant neoplasia by the 
detection of at least one marker whereby the marker is a VNTR, SNP, RFLP or STS 

15 characterized in that the marker is located on one chromosomal region which is altered in 

malignant neoplasia due to amplification and the marker is detected in a cancerous and a 
non-cancerous tissue or biological sample of the same individual. 

9. The method of aspect 8 wherein the marker is selected from the group consisting of the 
VNTRs: 

20 D17S946, D17S1181, D17S2026, D17S838, D17S250, D17S1818, D17S614, D17S2019, 

D17S608, D17S1655, D17S2147, D17S754, D17S1814, D17S2007, D17S1246, 
D17S1979, D17S1984, D17S1984, D17S1867, D17S1788, D17S1836, D17S1787, 
D17S1660, D17S2154, D17S1955, D17S2098, D17S518, D17S1851, D11S4358, 
D17S964, D19S1091, D17S1179, D10S2160, D17S1230, D17S1338, D17S2011, 

25 D17S1237, D17S2038, D17S2091, D17S649, D17S1 190 and M87506. 

10. The method of aspect 8 wherein the marker is selected from the group consisting of the 
SNPs: 

rs2230698, rs2230700, rsl058808, rsl801200, rs903506, rs2313170, rsll36201, 
rs2934968, rs2172826, rsl810132, rsl801201, rs2230702, rs2230701, nl 126503, rs3471, 
30 rsl3695, rs471692, rs558068, rsl064288, rsl061692, rs520630, rs782774, rs565121, 

rs2586112, rs532299, rs2732786, rsl804539, rsl804538, rsl804537, rs!141364, rs!2231, 
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rsll32259, rsll32257, rsll32256, rsll32255, rsll32254, rsll32252, rsll32268 and 
rsl 132258 

11. A method for the prediction, diagnosis or prognosis of malignant neoplasia by the 
detection of at least one marker characterized in that the marker is selected from: 

5 a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 

of SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 53 to 75, or 315 to 318 ; 

b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
conditions to a polynucleotide specified in (a) and encodes a polypeptide exhibiting 
the same biological function as specified for the respective sequence in Table 2 or 3 

10 c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 

polynucleotide specified in (a) and (c) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
15 derivative or allelic variation of a polynucleotide sequence specified in (a) to (d) 

e) a purified polypeptide encoded by a polynucleotide or polynucleotide analog 
sequence specified in (a) to (e) 

f) A purified polypeptide comprising at least one of the sequences of SEQ ID NO: 28 
to 32, 34, 35, 37 to 42, 44, 45, 47 to 52, 76 to 98, or 393 to 396; 

20 are detected. 



12. A method for the prediction, diagnosis or prognosis of malignant neoplasia by the 
detection of at least 2 markers characterized in that at least 2 markers are selected from: 

a) polynucleotide or polynucleotide analog comprising at least one of the sequences 
of SEQ ID NO: 1 to 26 or 53 to 75 or 315 to 318; 

25 b) a polynucleotide or polynucleotide analog which hybridizes under stringent 

conditions to a polynucleotide specified in (a) and encodes a polypeptide 
exffibitffig the~same biological function as specified for the" re^^ti^sequence in 
Table 2 or 3 
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c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 
polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

5 d) a polynucleotide or polynucleotide analog which represents a specific fragment, 

derivative or allelic variation of a polynucleotide sequence specified in (a) to (c) 

e) a purified polypeptide encoded by a polynucleotide sequence or polynucleotide 
analog specified in (a) to (d) 

f) a purified polypeptide comprising at least one of the sequences of SEQ ID NO: 27 
10 to 52 or 76 to 98 or 393 to 396 

are detected. 

13. The method of any of the aspects 1 or 12 wherein the detection method comprises the use 
of PGR, arrays or beads. 

14. A diagnostic kit comprising instructions for conducting the method of any of aspects 1 to 
15 13. 

15. A composition for the prediction, diagnosis or prognosis of malignant neoplasia 
comprising: 

a) a detection agent for: 

i) any polynucleotide or polynucleotide analog comprising at least one of the 
20 sequences of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19, 21 to 26, 53 to 75, 

or 315 to 318, 

ii) any polynucleotide or polynucleotide analog which hybridizes under 
stringent conditions to a polynucleotide specified in (a) encoding a 
polypeptide exhibiting the same biological function as specified for the 

25 respective sequence in Table 2 or 3 

iii) a polynucleotide or polynucleotide analog the sequence of which deviates 
"from the polynucleotide" specified in (a)"and (b) due to the generation of 

the genetic code encoding a polypeptide exhibiting the same biological 
function as specified for the respective sequence in Table 2 or 3 
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iv) a polynucleotide or polynucleotide analog which represents a specific 
fragment, derivative or allelic variation of a polynucleotide sequence 
specified in (a) to (c) 

v) a polypeptide encoded by a polynucleotide or polynucleotide analog 
5 sequence specified in (a) to (d); 

vi) a polypeptide comprising at least one of the sequences of SEQ ID NO: 28 
to 32, 34, 35, 37 to 42, 44, 45, 47 to 52, 76 to 98, or 393 to 396. 

or 

b) at least 2 detection agents for at least 2 markers selected from: 

10 1 i) any polynucleotide comprising at least one of the sequences of SEQ ID 

NO: 1 to 26 or 53 to 75 or 315 to 318; 

ii) any polynucleotide which hybridizes under stringent conditions to a 
polynucleotide specified in (a) encoding a polypeptide exhibiting the same 
biological function as specified for the respective sequence in Table 2 or 3 

15 iii) a polynucleotide the sequence of which deviates from the polynucleotide 

specified in (a) and (b) due to the generation of the genetic code encoding 
a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

iv) a polynucleotide which represents a specific fragment, derivative or allelic 
20 variation of a polynucleotide sequence specified in (a) to (c) 

v) a polypeptide encoded by a polynucleotide sequence specified in (a) to (d); 

vi) a polypeptide comprising at least one of the sequences of SEQ ID NO: 27 
to 52 or 76 to 98 or 393 to 396. 

16. An array comprising a plurality of polynucleotides or polynucleotide analogs wherein each 
25 of the polynucleotides is selected from: 



a)- 



a polynucleotide or polynucleotide analog comprising at least one of the sequences 
of SEQ ID NO: 1 to 26 or 53 to 75 or 315 to 318; 
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b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
conditions to a polynucleotide specified in (a) encoding a polypeptide exhibiting 
the same biological function as specified for the respective sequence in Table 2 or 
3 

5 c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 

polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
10 derivative or allelic variation of a polynucleotide sequence specified in (a) to (c) 

attached to a solid support. 

17. A method of screening for agents which regulate the activity of a polypeptide encoded by a 
polynucleotide or polynucleotide analog selected from the group consisting of: 

a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 
15 of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19, 21 to 26, 53 to 75 or 315 to 318; 

b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
conditions to a polynucleotide specified in (a) encoding a polypeptide exhibiting 
the same biological function as specified for the respective sequence in Table 2 or 
3 

20 c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 

polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
25 derivative or allelic variation of a polynucleotide sequence specified in (a) to (c); 

comprising the steps of: 

i) contacting a test compound with at least one polypeptide encoded by a 
polynucleotide specified in (a) to (d); and 
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ii) detecting binding of the test compound to the polypeptide, wherein a test 
compound which binds to the polypeptide is identified as a potential therapeutic 
agent for modulating the activity of the polypeptide in order to prevent of treat 
malignant neoplasia. 

5 18. A method of screening for agents which regulate the activity of a polypeptide encoded by a 
polynucleotide or polynucleotide analog selected from the group consisting of: 

a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 
of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19, 21 to 26, 53 to 75, or 315 to 318; 

b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
10 conditions to a polynucleotide specified in (a) encoding a polypeptide exhibiting 

the same biological function as specified for the respective sequence in Table 2 or 
3 

c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 
polynucleotide specified in (a) and (b) due to the generation of the genetic code 

15 encoding a polypeptide exhibiting the same biological function as specified for the 

respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
derivative or allelic variation of a polynucleotide sequence specified in (a) to (c) 

comprising the steps of: 

20 i) contacting a test compound with at least one polypeptide encoded by a 

polynucleotide specified in (a) to (d); and 

ii) detecting the activity of the polypeptide as specified for the respective sequence in 
Table 2 or 3, wherein a test compound which increases the activity is identified as 
a potential preventive or therapeutic agent for increasing the polypeptide acitivity 
25 in malignant neoplasia, and wherein a test compound which decreases the activity 

of the polypeptide is identified as a potential therapeutic agent for decreasing the 
polypeptide activity in malignant neoplasia. 



19. 



A method-of screening for agents which regulate the activity of a polynucleotide or 
polynucleotide analog selected from group consisting of; 
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a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 
of SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 53 to 75 or 315 to 318; 

b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
conditions to a polynucleotide specified in (a) encoding a polypeptide exhibiting 

5 the same biological function as specified for the respective sequence in Table 2 or 

3 

c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 
polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 

10 respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
derivative or allelic variation of a polynucleotide sequence specified in (a) to (c) 

comprising the steps of: 

i) contacting a test compound with at least one polynucleotide or polynucleotide 
15 analog specified in (a) to (d), and 

ii) detecting binding of the test compound to the polynucleotide, wherein a test 
compound which binds to the polynucleotide is identified as a potential preventive 
or therapeutic agent for regulating the activity of the polynucleotide in malignant 
neoplasia. 

20 20. Use of 

a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 
of SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 53 to 75 or 315 to 318; 

b) a polynucleotide which hybridizes under stringent conditions to a polynucleotide 
or polynucleotide analog specified in (a) encoding a polypeptide exhibiting the 

25 same biological function as specified for the respective sequence in Table 2 or 3; 

c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 
polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3; 
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d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
derivative or allelic variation of a polynucleotide sequence specified in (a) to (c); 

e) an antisense molecule targeting specifically one of the polynucleotide sequences 
specified in (a) to (d); 

5 f) a purified polypeptide encoded by a polynucleotide or polynucleotide analog 

sequence specified in (a) to (d) 

g) a purified polypeptide comprising at least one of the sequences of SEQ ID NO: 28 
to 32, 34, 35, 37 to 42, 44, 45, 47 to 52, 76 to 98 or 393 to 396; 

h) an antibody capable of binding to one of the polynucleotide specified in (a) to (d) 
10 or a polypeptide specified in (f) and (g); 

i) a reagent identified by any of the methods of aspect 17 to 19 that modulates the 
amount or activity of a polynucleotide sequence specified in (a) to (d) or a 
polypeptide specified in (f) and (g); 

in the preparation of a composition for the prevention, prediction, diagnosis, prognosis or a 
1 5 medicament for the treatment of malignant neoplasia. 

2 1 . Use of aspect 20 wherein the disease is breast cancer. 

22. A reagent that regulates the activity of a polypeptide selected from the group consisting of: 

a) a polypeptide encoded by any polynucleotide or polynucleotide analog comprising 
at least one of the sequences of SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 

20 53 to 75 or 315 to 318; 

b) a polypeptide encoded by any polynucleotide or polynucleotide analog which 
hybridizes under stringent conditions to any polynucleotide comprising at least one 
of the sequences of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19, 21 to 26, 53 to 75 or 
315 to 318 encoding a polypeptide exhibiting the same biological function as 

25 specified for the respective sequence in Table 2 or 3 

c) a polypeptide encoded by any polynucleotide or polynucleotide analog the 
- sequence of which deviates from the polynucleotide specified in (a) and (b) due to 

the generation of the genetic code encoding a polypeptide exhibiting the same 
biological function as specified for the respective sequence in Table 2 or 3 
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. d) a polypeptide encoded by any polynucleotide or polynucleotide analog which 
represents a specific fragment, derivative or allelic variation of a polynucleotide 
sequence specified in (a) to (c)_encoding a polypeptide exhibiting the same 
biological function as specified for the respective sequence in Table 2 or 3 

e) or a polypeptide comprising at least one of the sequences of SEQ ID NO: 28 to 32, 
34, 35, 37 to 42, 44, 45, 47 to 52, 76 to 98 or 393 to 396; 

wherein said reagent is identified by the method of any of the aspects 17 to 19. 

23. A reagent that regulates the activity of a polynucleotide or polynucleotide analog selected 
from the group consisting of: 

a) a polynucleotide or polynucleotide analog comprising at least one of the sequences 
SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 53 to 75 or 315 to 318; 

b) a polynucleotide or polynucleotide analog which hybridizes under stringent 
conditions to a polynucleotide specified in (a) encoding a polypeptide exhibiting 
the same biological function as specified for the respective sequence in Table 2 or 
3 

c) a polynucleotide or polynucleotide analog the sequence of which deviates from the 
polynucleotide specified in (a) and (b) due to the generation of the genetic code 
encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

d) a polynucleotide or polynucleotide analog which represents a specific fragment, 
derivative or allelic variation of a polynucleotide sequence specified in (a) to (c) 
encoding a polypeptideexhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

wherein said reagent is identified by the method of any of the aspects 17 to 19. 

24. A pharmaceutical composition, comprising: 

a) an expression vector containing at least one polynucleotide or polynucleotide 
analog selected from the group consisting of: 
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i) a polynucleotide or polynucleotide analog comprising at least one of the 
sequences of SEQ ID NO: 2 to 6, 8, 9, 1 1 to 16, 18, 19, 21 to 26, 53 to 75 
or 315 to 318; 

ii) a polynucleotide or polynucleotide analog which hybridizes under 
5 stringent conditions to a polynucleotide specified in (a) encoding a 

polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table 2 or 3 

iii) a polynucleotide or polynucleotide analog the sequence of which deviates 
from the polynucleotide specified in (a) and (b) due to the generation of 

10 the genetic code_encoding a polypeptide exhibiting the same biological 

function as specified for the respective sequence in Table 2 or 3 

iv) a polynucleotide or polynucleotide analog which represents a specific 
fragment, derivative or allelic variation of a polynucleotide . sequence 
specified in (a) to (c)_encoding a polypeptide exhibiting the same 

15 biological function as specified for the respective sequence in Table 2 or 3; 

or the reagent of aspect 22 or 23 and a pharmaceutically acceptable carrier. 
25. A computer-readable medium comprising: 

a) at least one digitally encoded value representing a level of expression of at least 
one polynucleotide sequence of SEQ ID NO: 2 to 6, 8, 9, 11 to 16, 18, 19, 21 to 

20 26,53 to 75 or 315 to 318 

b) al least 2 digitally encoded values representing the levels of expression of at least 2 
polynucleotide sequences selected from SEQ ID NO: 1 to 26, 53 to 75 or 315 to 
318 

in a cell from the a subject at risk for or having malignant neoplasia. 

25 26. A method for the detection of chromosomal alterations characterized in that the relative 
abundance of individual mRNAs, encoded by genes, located in altered chromosomal 
regions is detected. 



27. A method for the detection of chromosomal alterations characterized in that the copy 
number of one or more chromosomal region(s) is detected by quantitative PCR, 
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EXAMPLE 1 

Expression profiling 

a) Expression profiling utilizing quantitative RT-PCR 

For a detailed analysis of gene expression by quantitative PCR methods, one will utilize primers 
5 flanking the genomic region of interest and a fluorescent labeled probe hybridizing in-between. 
Using the PRISM 7700 Sequence Detection System of PE Applied Biosystems (Perkin Elmer, 
Foster City, CA, USA) with the technique of a fluorogenic probe, consisting of an oligonucleotide 
labeled with both a fluorescent reporter dye and a quencher dye, one can perform such a 
expression measurement. Amplification of fee probe-specific product causes cleavage of the probe, 

10 generating an increase in reporter fluorescence. Primers and probes were selected using the Primer 
Express software and localized mostly in the 3' region of the coding sequence or in the 3 ! 
untranslated region (see Table 5 for primer- and probe- sequences) according to the relative 
positions of the probe sequence used for the construction of the Affymetrix HGJCJ95A-E or HG- 
U133A-B DNA-chips. All primer pairs were checked for specificity by conventional PCR 

15 reactions. To standardize the amount of sample RNA, GAPDH was selected as a reference, since it 
was not differentially regulated in the samples analyzed. TaqMan validation experiments were 
performed showing that the efficiencies of the target and the control amplifications are 
approximately equal which is a prerequisite for the relative quantification of gene expression by 
the comparative AAC T method, known to those with skills in the art. 

20 As well as the technology provided by Perkin Elmer one may use other technique implementations 
like Lightcycler ™ from Roche Inc. or iCycler from Stratagene Inc.. 

b) Expression profiling utilizing DNA microarravs 

Expression profiling can bee carried out using the Affymetrix Array Technology. By hybridization 
of mRNA to such a DNA-array or DNA-Chip, it is possible to identify the expression value of 

25 each transcripts due to signal intensity at certain position of the array. Usually these DNA-arrays 
are produced by spotting of cDNA, oligonucleotides or subcloned DNA fragments. In case of 
Affymetrix technology app. 400.000 individual oligonucleotide sequences were synthesized on the 
surface of a silicon wafer at distinct positions. The minimal length of oligomers is 12 nucleotides, 
preferable 25 nucleotides or full length of the questioned transcript. Expression profiling may also 

30 be carried out by hybridization to nylon or nitro-cellulose membrane bound- DNA or - 
oligonucleotides. Detection of signals derived from hybridization may be obtained by either 
colorimetric, fluorescent, electrochemical, electronic, optic or by radioactive readout. Detailed 
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description of array construction have been mentioned above and in other patents cited. To 
determine the quantitative and qualitative changes in the chromosomal region to analyze, RNA 
from tumor tissue which is suspected to contain such genomic alterations has to be compared to 
RNA extracted from benign tissue (e.g. epithelial breast tissue, or micro dissected ductal tissue) on 
5 the basis of expression profiles for the whole transcriptome. With minor modifications, the sample 
preparation protocol followed the Asymetrix GeneChip Expression Analysis Manual (Santa Clara, 
CA). Total RNA extraction and isolation from tumor or benign tissues, biopsies, cell isolates or 
cell containing body fluids can be performed by using TRIzol (Life Technologies, Rockville, MD) 
and Oligotex mRNA Midi kit (Qiagen, Hilden, Germany), and an ethanol precipitation step should 
10 be carried out to bring the concentration to 1 mg/ml. Using 5-10 mg of mRNA to create double 
stranded cDNA by the Superscript system (Life Technologies). First strand cDNA synthesis was 
primed with a T7-(dT24) oligonucleotide. The cDNA can be extracted with phenol/chloroform and 
precipitated with ethanol to a final concentration of lmg /ml. From the generated cDNA, cRNA 
can be synthesized using Enzo's (Enzo Diagnostics Inc., Farmingdale, NY) in vitro Transcription 
15 Kit. Within the same step the cRNA can be labeled with biotin nucleotides Bio-1 1-CTP and Bio- 
16-UTP (Enzo Diagnostics Inc., Farmingdale, NY) . After labeling and cleanup (Qiagen, Hilden 
(Germany) the cRNA then should be fragmented in an appropriated fragmentation buffer (e.g., 40 
mM Tris-Acetate, pH 8.1, 100 mM KOAc, 30 mM MgOAc, for 35 minutes at 94°Q. As per the 
Affymetrix protocol, fragmented cRNA should be hybridized on the HGJJ133 arrays A and B, 
20 comprising app. 40.000 probed transcripts each, for 24 hours at 60 rpm in a 45°C hybridization 
oven. After Hybridization step the chip surfaces have to be washed and stained with streptavidin 
phycoerythrin (SAPE; Molecular Probes, Eugene, OR) in Affymetrix fluidics stations. To amplify 
staining, a second labeling step can be introduced, which is recommended but not compulsive. 
Here one should add SAPE solution twice with an antistreptavidin biotinylated antibody. 
25 Hybridization to the probe arrays may be detected by fluorometric scanning (Hewlett Packard 
Gene Array Scanner; Hewlett Packard Corporation, Palo Alto, CA). 

After hybridization and scanning, the microarray images can be analyzed for quality control, 
looking for major chip defects or abnormalities in hybridization signal. Therefor either Affymetrix 
GeneChip MAS 5.0 Software or other microarray image analysis software can be utilized. Primary 
30 data analysis should be carried out by software provided by the manufacturer.. 

In case of the genes analyses in one embodiment of this invention the primary data have been 
analyzed by further bioinformatic tools and additional filter criteria. The bioinformatic analysis is 
~ described in detail below. 



t 
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According to Affymetrix measurement technique (Asymetrix GeneChip Expression Analysis 
Manual, Santa Clara, CA) a single gene expression measurement on one chip yields the average 
difference value and the absolute call. Each chip contains 16-20 oligonucleotide probe pairs per 
5 gene or cDNA clone. These probe pairs include perfectly matched sets and mismatched sets, both 
of which are necessary for the calculation of the average difference, or expression value, a measure 
of the intensity difference for each probe pair, calculated by subtracting the intensity of the 
mismatch from the intensity of the perfect match. This takes into consideration variability in 
hybridization among probe pairs and other hybridization artifacts that could affect the fluorescence 

10 intensities. The average difference is a numeric value supposed to represent the expression value 
of that gene. The absolute call can take the values 'A' (absent), 'M' (marginal), or T' (present) 
and denotes the quality of a single hybridization. We used both the quantitative information given 
by the average difference and the qualitative information given by the absolute call to identify the 
genes which are differentially expressed in biological samples from individuals with breast cancer 

15 versus biological samples from the normal population. With other algorithms than the Affymetrix 
one we have obtained different numerical values representing the same expression values and 
expression differences upon comparison. 

The differential expression E in one of the breast cancer groups compared to the normal population 
is calculated as follows. Given n average difference values d u d 2 , d„ in the breast cancer 
20 population and m average difference values c b c 2 , Cn, in the population of normal individuals, it 
\ is computed by the equation: 

If dj<50 or Ci<50 for one or more values of i and j, these particular values c { and/or dj are set to an 
"artificial" expression value of 50. These particular computation of E allows for a correct 
25 comparison to TaqMan results. 

A gene is called up-regulated in breast cancer versus normal if E^l .5 and if the number of absolute 
calls equal to T' in the breast cancer population is greater than n/2. 

A gene is called down-regulated in breast cancer versus normal if E^1.5 and if the number of 
absolute calls equal to T' in the normal population is greater than m/2. 
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The final list of differentially regulated genes consists of all up-regulated and all down-regulated 
genes in biological samples from individuals with breast cancer versus biological samples from the 
normal population. Those genes on this list which are interesting for a pharmaceutical application 
were finally validated by TaqMan. If a good correlation between the expression values/behavior of 
5 a transcript could be observed with both techniques, such a gene is listed in Tables 1 to 3. 

Since not only the information on differential expression of a single gene within an identified 
ARCHEON, but also the information on the co-regulation of several members is important for 
predictive, diagnostic, preventive and therapeutic purposes we have combined expression data with 
information on the chromosomal position (e.g. golden path) taken from public available databases 

10 to develop a picture of the overall transcriptom of a given tumor sample. By this technique not 
only known or suspected regions of genomes can be inspected but even more valuable, new 
regions of disregulation with chromosomal linkage can be identified. This is of value in other types 
of neoplasia or viral integration and chromosomal rearrangements. By SQL based database 
searches one can retrieve information on expression, qualitative value of a measurement (denoted 

15 by Affymetrix MAS 5.0 Software), expression values derived from other techniques than DNA- 
chip hybridization and chromosomal linkage. 

EXAMPLE 2 

Identification of the ARCHEON 

o) Identification and localization of eenes or gene probes (represented by the so called probe 
20 sets on Affymetrix arrays HG-U95A-E or HG-U133A-B) in their chromosomal context and 

order on the human genome. 

For identification of larger chromosomal changes or aberrations, as they have been described in 
detail above, a sufficient number of genes, transcripts or DNA-fragments is needed. The density of 
probes covering a chromosomal region is not necessarily limited to the transcribed genes, in case 

25 of the use of array based CGH but by utilizing RNA as probe material the density is given by the 
distance of genes on a chromosome. The DNA-microarrays provided by Affymetrix Inc. Do 
contain hitherto all transcripts from the known humane genome, which are be represented by 
40.000 - 60.000 probe sets. By BLAST mapping and sorting the sequences of these short DNA- 
oligomers to the public available sequence of the human genome represented by the so called 

30 "golden path", available at the university of California in Santa Cruz or from the NCBI, a 
chromosomal display ~of the~ whole TransOTptbme "of a tissue Specimen evolves. By graphical 
display of the individual chromosomal regions and color coding of over or under represented 
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transcripts, compared to a reference transcriptome regions with DNA gains and losses can be 
identified. 

b) Quantification of gene copy numbers bv combined IHC and quantitative PCR (PCR 
karyotyping) or directly bv quantitative PCR 

5 Usually one to three paraffin-embedded tissue sections that are 5 jim thick are used to obtain 
genomic DNA from the samples. Tissue section are stained by colorimetric IHC after 
deparaffinization to identify regions containing disease associated cells. Stained regions are 
macrodissected with a scalpel and transferred into a micro-centrifuge tube. The genomic DNA of 
these isolated tissue sections is extracted using appropriate buffers. The isolated DNA is then used 
10 for quantitative PCR with appropriate primers and probes. Optionally the IHC staining can be 
omitted and the genomic DNA can be directly isolated with or without prior deparaffinization with 
appropriate buffers. Those who are skilled in the art may vary the conditions and buffers described 
below to obtain equivalent results. 

Reagents from DAKO (HercepTest Code No. K 5204) and TaKaRa were used (Biomedicals Cat.: 
1 5 909 1 ) according to the manufactures protocol. 

It is convenient to prepare the following reagents prior to staining: 

Solution No. 7 

Epitope Retrieval Solution (Citrate buffer + antimicrobial agent) (lOxconc.) 
20 ml ad 200 ml aqua dest. (stable for lmonth at 2-8°C ) 
20 Solution No. 8 

Washing-buffer (Tris-HCl + antimicrobial agent) (10 x cone.) 
30 ml ad 300 ml destilled water (stable for lmonth at 2-8°C ) 
Staining solution: DAB 

1 ml solution is sufficient for 10 slides. The solution were prepared immediately before usage.: 

25 1 ml DAB buffer (Substrate Buffer solution, pH 7.5, containing H 2 0 2 , stabilizer, enhancers and an 
antimicrobial agent)" + 1 drop" (25-3 DAB-Chromogen (S^'-diaminobenadine chromogen 
solution). This solution is stable for up to 5 days at 2-8°C. Precipitated substances do not influence 
the staining result. Additionally required are:2 x approx. 100 ml Xylol, 2 x approx. 100 ml Ethanol 
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100%, 2 x Ethanol 95%, aqua dest. These solution can be used for up to 40 stainings. A water bath 
is required for the epitope retrieval step. 

Staining procedure: 

All reagents are pre-warmed to room temperature (20-25°Q prior to immunostaining. Likewise all 
5 incubations were performed at room temperature. Except the epitope retrieval which is performed 
in at 95 °C water bath. Between the steps excess of liquid is tapped off from the slides with lintless 
tissue (Kim Wipe). 

Deparaffinization 

Slides are placed in a xylene bath and incubated for 5 minutes. The bath is changed and the step 
10 repeated once. Excess of liquid is tapped off and the slides are placed in absolute ethanol for 3 
minutes. The bath is changed and the step repeated once. Excess of liquid is tapped off and the 
slides are placed in 95% ethanol for 3 minutes. The bath is changed and the step repeated once. 
Excess of liquid is tapped off and the slides are placed in distilled water for a minimum of 30 
seconds. 

15 Epitope Retrival 

Staining jars are filled with with diluted epitope retrieval solution and preheated in a water bath at 
95°C. The deparaffinized sections are immersed into the preheated solution in the staining jars and 
incubated for 40 minutes at 95°C. The entire jar is removed from the water bath and allowed to 
cool down at room temperature for 20 minutes. The epitope retrieval solution is decanted, the 
20 sections are rinsed in distilled water and finally soaked in wash buffer for 5 minutes. 

Peroxidase Blocking : 

Excess of buffer is tapped off and the tissue section encircled with a DAKO pen. The specimen is 
covered with 3 drops (100 jil) Peroxidase-Blocking solution and incubated for 5 minutes. The 
slides are rinsed in distilled water and placed into a fresh washing buffer bath. 

25 Antibody Incubation 

Excess of liquid is tapped off and the specimen are covered with 3 drops (100 ^1) of Anti-Her- 
2/neu reagent (Rabbit Anti-Human Her2 Protein in 0.05mol/L Tris/HCl, O.lmol/L NaCl, 
15 mmbl/L pH7.2 NaN 3 containing stabilizing protein) or negative control reagent (= IGG fraction 
of normal rabbit serum at an equivalent protein concentration as the Her2 Ab). After 30 minutes of 
30 incubation the slide is rinsed in water and placed into a fresh water bath. 
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Excess of liquid is tapped off and the specimen are covered with 3 drops (100 |xl) of visualization 
reagent After 30 minutes of incubation the slide is rinsed in water and placed into a fresh water 
bath. Excess of liquid, is tapped off and the specimen are covered with 3 drops (100 of 
5 Substrate-Chromogen solution (DAB) for 10 minutes. After rinsing the specimen with distilled 
water, photographs are taken with a conventional Olympus microscope to document the staining 
intensity and tumor regions within the specimen. Optionally a counterstain with hematoxylin was 
performed. 

DNA extraction 

10 The whole specimens or dissected subregions are transferred into a microcentrifuge tubes. 
Optionally a small amount (10^1) of preheated TaKaRa solution (DEXPAT™) is preheated and 
placed onto the specimen to facilitate sample transfer with a scalpel. 50 to 150 fxl of TaKaRa 
solution were added to the samples depending on the size of the tissue sample selected. The 
sample are incubated at 100°C for 10 minutes in a block heater, followed by centrifugation at 

15 12.000 rpm in a microcentrifuge. The supernatant is collected using a micropet and placed in a 
separate microcentrifuge tube. If no deparaffinization step has been undertaken one has to be sure 
not to withdraw tissue debris and resin. Genomic DNA left in the pellet can be collected by adding 
resin-free TaKaRa buffer and an additional heating and centrifugation step. Samples are stored at - 
20°C. 

20 Genomic DNA from different tumor cell lines (MCF-7, BT-20, BT-474, SKBR-3, AU-565, 
UACC-812, UACC-893, HCC-1008, HCC-2157, HCC-1954, HCC-2218, HCC-1937, HCC1599, 
SW480), or from lymphocytes is prepared with the QIAamp® DNA Mini Kits or the QIAamp® 
DNA Blood Mini Kits according to the manufacturers protocol. Usually between lng up to l\ig 
DNA is used per reaction. 

25 Quantitative PCR 

To measure the gene copy number of the genes within the patient samples the respective 
primer/probes (see table below) are prepared by mixing 25 \il of the 100 ^M stock solution 'Upper 
Primer", 25 \i\ of the 100 \iM stock solution "Lower Primer" with 12,5 \il of the 100 jxM stock 
solution Taq Man Probe (Quencher Tamra) and adjusted to 500 fxl with aqua dest. For each 
30 reaction 1,25 jil DNA-Extract of the patient samples or 1,25 jil DNA from the cell lines were 
mixed with 8,75 \ii nuclease-free water and added to one well of a 96 Well-Optical Reaction Plate 
(Applied Biosystems Part No. 4306737). 1,5 \i\ Primer/Probe mix, 12, pi Taq Man Universal-PCR 
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Mix (2x) (Applied Biosystems Part No. 4318157) and 1 \i\ Water are then added. The 96 well 
plates are closed with 8 Caps/Strips (Applied Biosystems Part Number 4323032) and centrifuged 
for 3 minutes. Measurements of the PCR reaction are done according to the instructions of the 
manufacturer with a TaqMan 7900 HT from Applied Biosystems (No. 20114) under appropriate 
5 conditions (2 min. 50°C, 10 min. 95°C, 0.15min. 95°C, 1 min. 60°C; 40 cycles). SoftwareSDS 2.0 
from Applied Biosysrtems is used according to the respective instructions. CT-values are then 
further analyzed with appropriate software (Microsoft Excel™). 

EXAMPLE 3 

Clinical Samples of patients being treated with Herceptin and a chemotherapeutic agent (e.g. 

10 docetaxel, paclitaxel, taxotere, carboplatin, cisplatin, oxaliplatin, vinorelbine) as a second line 
therapy have been obtained. These samples included formalin-fixed and paraffin-embedded 
material from primary tumours and metastatic lesions of the respective patients. However, the 
determination of the ARCHEON genes as disclosed in this invention, has also been performed 
from fresh tissue after nucleic acid extraction in an independent, neoadjuvant setting. Moreover, 

1 5 whole blood, serum and plasma samples were available for multiple patients. 

Multiparametric, clinical assessment of the response to Herceptin in combination with 
chemotherapeutics (e.g. docetaxel, taxotere, paclitaxel, vinorelbine, carboplatin, cisplatin), or other 
therapies described below, was performed. Clinical information included histological parameters 
(TNM-Stage, AJCC grade), standard molecular markers (IHC staining for estrogen receptor, 

20 progesteron receptor, Her-2/neu) and sonographical or radiological assessment (e.g. CT). Response 
to treatment was evaluated according to international standards, i.e. modified WHO criteria and 
RECIST criteria. Each cancer evaluation in the course of the disease was documented (method and 
date of evaluation, organ, anatomical description, measurability, size of lesion (longest diameter), 
greatest perpendicular diameter, tumor area). Moreover, each systemic anticancer therapy 

25 including prior chemotherapy with anthracyclins (Doxorubicin or Epirubicin) and/or CMF and the 
response thereto was evaluated (drug, intent, duration, schedule, number of cycles, cumulative 
dose). The response to combinatory treatment of metatstatic breast cancer patients with Herceptin 
and chemotherapeutica as second line treatment the modified WHO criteria were used, hi addition 
the initial disease free survival, duration of response and time to progreesion were taken into 

30 consideration. For definition of treatment response standard criteria were used: „Complete 
Response" („CR" = tumor shrinkage of 100 % with no residual disease being clinical detectable), 
partial Response" („PR" = tumor shrinkage of target lesion of at least 50%), „Stable Disease" 
(„SD" » tumor shrinkage of less than 50 % or no change) and „Progressive Disease" („PD" = 
tumor growth or new tumor lesions). 
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More than 70 genes were analyzed according to the method disclosed in example 2 by combined 
IHC and quantitative PGR or directly by quantitative PCR after nucleic acid extraction from the 
formaldehyde-fixed, paraffin-embedded tissue slides. Results were reconfirmed by independent 
methodology (VNTR and SNP detection). Alterations of the 43 ARCHEON genes were 
5 determined by comparison with reference genes, that are located on the same chromosome (= 
intrachromosomal control,) or different chromosomes (= extrachromosomal control). 
Mrachromosomal reference genes included MMP28, hKa3 and K20. Extrachromosomal reference 
genes included GAPDH for chromosome 12. However any other gene not included in the 
ARCHEONs disclosed in this invention can be used as reference gene for ARCHEON 

10 characterizetion. The reference genes should be independent from the ARCHEON alterations 
occuring in the neoplastic lesions and should be not affected by chromosomal alterations such as 
amplifications and deletions. As gene copy numbers of non-amplified genes can be increased in 
neoplastic lesions due to genomic imbalances such as aneuploidie or polyploidie, each 
measurement of ARCHEON genes was correlated to multiple reference genes to minimize the 

15 influence of genomic imbalances on the relative copy number calculation. Moreover, minor 
systemic errors occuring due to differences in the performance of individual primer/probe pairs 
were minimized by determining primer/probe performances in control tissues (i.e. non-neoplastic 
tissues from healthy controls) and euploid control cell lines (e.g. HS68, ATCC #CRL1635). 
Moreover one well charcterized, control cell line was used, that displays aneuploidie for a single 

20 chromosome (i.e. Detroit, ATCC#CCL-54; trisomie 21). By measuring genes located on the X- 
chromosome (e.g. SRY), the Y-chromosome (e.g. Xist) and on chromosome 21, defined copy 
numbers of 1, 2 and 3 genes could be determined as internal control during each run for 
standardization. In addition, synthetic targets were spiked into some reactions, that consisted of the 
target region of the PCR forward and reverse primers of the gene to be normalized, but in between 

25 consisted of a synthetic probe hybridization region different from the original probe region of the 
target gene to be normalized. This allowed internal standardization of each individual qPCR 
reaction by multiplex PCR. The calculated performance differences were used as a filter for the 
measurements within the target tissues, i.e. primer/probe differences of each individual gene as 
depicted in the control cells and tissues were subtracted from each individual gene measurement 

30 performed in the target tissue. Thereafter, the individual, filtered CT values were normalized to the 
different reference genes. Differences between the CT values of the quantitative PCR reactions of 
the ARCHEON genes and the reference genes remaining after filtering the primer/probe 
performance differences were determined and transformed into „copy numbers per cell". This was 
done by subtracting the CT values of the target genes from the CT values of the reference genes. 

35 The resulting ACT values were then transformed in gene copy numbers, with the ACT value of the 
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reference gene (ACT=0) being defined as ,,2 copies per cell", by the following formula: 
2*(2 A ( ACT*(-1))). All the calculations were done using standard software (Microsoft ExcelTM). 

References 
Patents cited 



U.S. Pat. No. 5,593,839 
• U.S. Pat. No. 5,578,832 
10 U.S. Pat. No. 5,556,752 

U.S.PatNo. 5,631,734 

U.S. Pat No. 5,599,695 

U.S. Pat No. 4,683,195 

U.S.PatNo. 5,498,531 
15 U.S. Pat. No. 5,714,331 

U.S. Pat. No. 5,641,673 Haseloff et al, 

U.S. Pat. No. 5,223,409 Lander, E., 

U.S. Pat. No. 5,976,8 1 3 Beutel et al. 

US.PatNo. 5,283,317 
20 US. Pat No. 6,203,987 

US. Pat.No. 6,379,895 

WO 97/29212 

WO 97/27317 

WO 95/22058 
25 WO 99/12826 

WO 97/02357 

WO 94/13804 

WO 94/10300 

WO 97/14028 
30 WO 99/52708 

EP 0 785 280 

EP 0 799 897 

EP 0 728 520 

EP 0 721 016 
35 EP 0 321 201 



5 U.S. Pat. No. 4,843,155 
US.PatNo. 5,262,31 
US. Pat. No. 4,683,202 



Chomczynski, P. 

Liang, P., and Pardee, A. B., 1993 
Mullis,K.B., 1987 



WO 2005/047534 PCT/EP2004/011599 

-132- 

GB2188638B 
Publications cited 





0) 


Gusterson et al., Journal of Clinical Oncology 10, 1049-1056, 1992 




(2) 


Achuthan et al., Cancer Genet Cytogenet 130: 166-72, 2001 


5 


(3) 


Tomasetto etal., FEBS Lett 373: 245-249, 1995 




(4) 


Pragnell et al., FEBS Lett 291: 253-258, 1991 




(5) 


Nakamichi et al. 1986 




(6) 


Feo et al., Proc. Nat. Acad. Sci. 86: 6691-6695, 1989 




(7) 


Davies et al.,. Proc. Nat. Acad. Sci. 86: 6691-6695, 1989 


10 


(8) 


Lee et al., Molec. Endocr. 9: 243-254, 1995 




(9) 


Drane et al., Oncogene 15: 3013-3024, 1997 




(10) 


Zhu et al., J. Biol. Chem. 272: 25500-25506, 1997 




01) 


Yuan et al., Proc. Nat. Acad. Sci. 95: 7939-7944, 1998 




(12) 


Zhu et al., Proc. Nat. Acad Sci. 96: 10848-10853, 1999 


15 


(13) 


Lee et al., Science 268: 836-844, 1 995 




(14) 


McCormick et al., Molec. Cell. Biol. 16: 5792-5800, 1996 




(15) 


Tamimi et al., Genomics 40: 355-357, 1997 




(16) 


Valleetal., FEBS Lett. 415: 163-168, 1997 




(17) 


Kaneda et al., J. Biol. Chem. 263: 7672-7677, 1988 


20 


(18) 


Hoehe et al., Hum Molec. Genet. 1: 175-178, 1992 




(19) 


Yang-Feng et al., Abstract Cytogenet. Cell Genet. 40: 784, 1985 




(20) 


Coussens et al., Science 230: 1132-1139, 1985 




(21) 


van de Vijver et al., New Eng. J. Med. 319: 1239-1245, 1988 




(22) 


Slamon et al., Science 244: 707-712, 1989 


25 


(23) 


Fukushige et al., Res. Commun. 134: 477-483, 1986 




(24) 


Kaneko et al., Jpn. J, Cancer Res. 78: 16-19, 1987 




(25) 


Di Fiore et al., Science 237: 178-182, 1987 




(26) 


Popescu et al., Genomics 4: 362-366, 1989, 




(27) 


Qiu et al, Nature 393: 83-85, 1998 


30 


(28) 


Yu et al., Molec. Cell 2: 581-591, 1998 




(29) 


Doherty et al., Proc. Nat. Acad. Sci. 96: 10869-10874, 1999 




(30) 


Slamon et al., New Eng. J. Med. 344: 783-792, 2001 




(31) 


Margolis et al., J. Clin. Invest. 102: 821-827, 1998 




(32) 


Tanaka et al.; J. Clin. Invest 102: 821-827, 1998 


35 


(33) 


Dong et al., J. Biol. Chem. 272: 29104-291 12, 1997 




(34) 


Stein et al., EMBO J.13:1331-40, 1994 



WO 2005/047534 PCT/EP2004/011599 

-133- . 







Nagata et al., Nature 319: 415-418, 1986 




(3t>) 


Le Beau etal., Leukemia 1: 795-799, 1987, 




(37) 


Jansson et al., EMBO J. 2: 561-565, 1983 




(38) 


fl _x _ 1 ft _ ? _ . _ AOT.1 ^1 A 1 ^" 1 yt 1 ftOT 

Thompson et al., Science 237:1610-1614, 1987 


J 


(39) 


Nakai et al., Proc. Nat Acad. Sci. 85: 2781-2785, 1988 




(40) 


Miyajima et al., Cell 57: 31-39, 1989 




f A 1 \ 

(41) 


Debuire et al., Science 224: 1456-1459, 1984 




(42) 


Petkovicn et al., Nature 330: 444-450, 1987 




(43) 


"fc £ i a ' _ x, _ 1 TT_— . . /~l _x OA. 1 O^ 1 OO 1 AOO 

Mattel et al., Hum. Genet. 80: 186-188, 1988. 




(44) 


"117-11 ^1 /"^ll TJ- _1 1 O. <>TfO A*7^"*7 1 AAO 

Williams et al., Molec. Cell. Biol. 18: 2758-2767, 1998 




(45) 


aaha et al, Molec. Cell. Biol. 18: 2758-2767, 1998 




(46) 


X/_ _x _ 1 TV_._ _ \T.l A -^-»J O—J AC l/AI l^rtn 1 AAO 

Yan et al., Proc. Nat. Acad. Sci. 95: 3603-3608, 1998 




f AT\ 

(47) 


Smgh et al., Nucleic Acids Res. 16: 3919-3929, 1988 




(A 0\ 

(48) 


Tsai-Pflugfelder et al., Proc. Nat. Acad. Sci. 85: 7177-7181, 1988 


1 c 

15 


(49) 


Chung et al., Proc. Nat. Acad. Sci. 86: 9431-9435, 1989 




(50) 


Lang et al., Gene 221: 255-266, 1998 




(51) 


Keith et al., Genes Chromosomes Cancer 4: 169-175, 1992 




(52) 


Kingsmore et al., Mammalian Genome 4: 288-289, 1993 




(53) 


Watt et al., Biocnem. J. 303: 681-695, 1994 


20 


(54) 


Kiefer et al, J, Biol. Chem. 267: 12692-12699, 1992 




(55) 


Snimasaki et al., Molec. Endocr. 4: 1451-1458, 1990 




(56) 


Zazzi et al., Genomics 49: 40 1 -41 0, 1 99 8 




(57) 


Bajahca et al., Hum. Genet 89: 234-236, 1992 




(58) 


Xomn et al., Genomics 18: 414-417, 1993 




(59) 


BirKenoacn et aJ., J. Virol. 67: 2209-2220, 1993 




(60) 


Wang et al., Proc. Nat. Acad. Sci. 95: 492-498, 1998 




(61) 


Klocnenaler-Yeivm et al., Cuir Opin Genet Dev 121:73-9, 2002 




(62) 


Kinget al., Genomics 51:140-3, 1998 




(63) 


Darmon et al., Molec. Biol. Rep. 12: 277-283, 1987 


30 


(64) 


i~7\- . -.x 1 T Til r\ . 1 S E Cl A 1 f f nn i noo 

Z3iou et al., J. Biol. Chem. 263: 15584-15589, 1988 




(65) 


T/ - -.x _1 T> — _ _ \T. x A -3 fl _ * OA- A 1 A A i >l 1 AAA 

Korge et al., Proc. Nat. Acad. Sci. 89: 910-914, 1992 




(66) 


Lessin et al., J. Invest. Derm. 91: 572-578, 1988 




(67) 


Romano et al, Cytogenet. Cell Genet. 58: 2009-2010, 1991. 




(68) 


Fuchs et al., Proc. Nat. Acad. Sci. 89: 6906-6910, 1992 


35 


(69) 


Rogaev et al., Nature Genet. 5: 158-162, 1993 




(70) 


Liu et al., Curr. Eye Res. 12: 963-974, 1993 



WO 2005/047534 PCT/EP2004/011599 

-134- 

(71) Nishida et al. Invest Ophthal. Vis. Sci. 37: 1800-1809, 1996 

(72) Nishida et al., Am. J. Hum. Genet. 61: 1268-1275, 1997 

(73) Meesmann and Wilke, 1939 

(74) Corden et al., Exp. Eye Res. 70: 41-49, 2000 

5 (75) van de Vijver et al., Mol Cell Biol 7, 2091-23, 1987 

(76) Offterdinger etal., BiochemBiophys Res Comm251, 907-13, 1988 

(77) Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, 2d ed., 1 989 

(78) Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New 
York, N.Y., 1989. 

10 (79) Tedder, T.F.etal.,Proc. Natl. Acad. Sci. U.S.A. 85:208-212, 1988 

(80) Hedrick, S. M. et al., Nature 308:149-153, 1984 

(81) Lee, S. W. et al., Proc. Natl. Acad. Sci. U.S.A. 88:4225, 1984 

(82) Sarkar, PCR Methods Applic. 2, 318-322, 1993 

(83) Triglia et al., Nucleic Acids Res. 16, 81-86, 1988 

15 (84) Lagerstrom et al., PCR Methods Applic. 1, 1 1 1-1 19, 1991 

(85) Copeland & Jenkins, Trends in Genetics 7: 1 13-118, 1991 

(86) • Cohen, et al., Nature 366: 698-701, 1993 

(87) Bonner et al., J. Mol. Biol. 81, 123 1973 

(88) Bolton and McCarthy, Proc. Natl. Acad. Sci. U.S.A. 48, 1390 1962 
20 (89) Plump et al. Cell 71: 343-353, 1992 

(90) Altschul et al., Bull. Math. Bio. 48:603, 1986, 

(91) Henikoff&Henikoff, Proc. Natl. Acad. Sci. USA 89:10915, 1992 

(92) Pearson & Lipman, Proc. Natl Acad. Sci. USA 85 :2444, 1 988 

(93) Pearson et al, Meth. Enzymol. 183:63, 1990 

25 (94) Needleman & Wunsch, J. Mol. Biol.48:444, 1 970 

(95) Sellers, SIAM J. Appl. Math.26:787, 1974 

(96) Takamatsu, EMBO J. 6, 307-3 1 1 , 1 987 

(97) Coruzzi et al, EMBO J. 3, 1671-1680, 1984 

(98) Broglie et al. Science 224, 838-843, 1984 

30 (99) Winter et al. Results Probl. Cell Differ. 17, 85-105, 1991 

(100) Engelhard et al, Proc. Nat. Acad. Sci. 91, 3224-3227, 1994 

(101) Logan & Shenk, Proc. Natl. Acad. Sci. 81, 3655-3659, 1984 

(102) Scharf et al. Results Probl. Cell Differ. 20, 125-162, 1994 

(103) Freshney R.I, ed, ANIMAL CELL CULTURE , 1986 
35 (104) Wigler etal. Cell 11,223-232, 1977 

(105) Lowy et al. Cell 22, 817-823, 1980 



WO 2005/047534 PCT/EP2004/011599 

- 135 - 

(106) Wigler et al., Proc. Natl. Acad. Sci. 77, 3567-3570, 1980 

(107) Colbere-Garapin et al., J. Mol. Biol. 150, 1 14, 1981 

(108) Hartman & Mulligan, Proc. Natl. Acad. Sci. 85, 8047-8051, 1988 

(109) Rhodes et al., Methods Mol. Biol. 55, 121-131, 1995 

5 (1 10) Hampton et al., SEROLOGICAL METHODS: A LABORATORY MANUAL, APS Press, St Paul, 
Minn., 1990 

(111) Maddox etal., J. Exp. Med. 158, 1211-1216, 1983 

(1 12) Porath et al., Prot. Exp. Purif. 3, 263-28 1, 1 992 

(1 13) Kroll et al., DNA Cell Biol. 12, 441-453, 1993 

10 (1 14) Caruthers et al., Nucl. Acids Res. Symp. Ser. 215-223, 1980 

(1 15) Horn et al. Nucl. Acids Res. Symp. Ser. 225-232, 1980 

(1 16) Merrifield, J. Am. Chem. Soc. 85, 2149-2154, 1963 

(117) Roberge et al., Science 269, 202-204, 1 995 

(118) Creighton, PROTEINS: STRUCTURES AND MOLECULAR PRINCIPLES, WH and Co., New 
15 York, N.Y., 1983 

(119) Cronin et al., Human Mutation 7:244, 1 996 

(120) Landegran et al., Science 241:1077-1080, 1988 

(121) Nakazawa et al., PNAS 91 :360-364, 1994 

(122) Abravaya et al., Nuc Acid Res 23:675-682, 1995 

20 (123) Guatelli, J.C. et al., Proc. Natl. Acad. Sci. USA 87: 1 874-1 878, 1 990 

(124) Kwoh, D.Y. etal., Proc. Nati. Acad. Sci. USA 86: 1173 -11 77, 1989 

(125) Lizardi, P.M. et al., Bio/Technology 6: 1 197, 1988 

(126) Brown, Meth. Mol. Biol. 20, 18, 1994 

(127) Sonveaux, Meth. Mol. Biol. 26, 1-72, 1994 
25 (128) Uhlmann etal., Chem. Rev. 90, 543-583, 1990 

(129) Gee et al., in Huber & Cair, MOLECULAR AND IMMUNOLOGIC APPROACHES, Publishing 
Co.,Mt.Kisco,N.Y, 1994 

(130) Agrawal et al., Trends Biotechnol. 10, 152-158, 1992 

(131) Uhlmann et al., Tetrahedron. Lett. 215, 3539-3542, 1987 
30 (132) Cech, Science 236, 1532-1539,1987 

(133) Cech, Ann. Rev. Biochem. 59, 543-568, 1990 

(134) Couture & Stinchcomb, Trends Genet. 12, 510-515, 1996 

(135) Haseloff et al. Nature 334, 585-591, 1988 

(136) Kohler et al., Nature 256, 495-497, 1985 

35 (137) Kozbor et al., J. Immunol. Methods 81, 3142, 1985 

(138) Cote et al., Proc. Natl. Acad. Sci. 80, 2026-2030, 1983 



WO 2005/047534 PCT/EP2004/0 11599 

- 136- 

(139) Cole etal.,Mol. Cell Biol. 62, 109-120, 1984 

(140) Morrison et al„ Proc. Natl. Acad. Sci. 81, 6851-6855, 1984 

(141) Neubergeretal, Nature 312, 604-608, 1984 

(142) Takedaetal, Nature 314, 452-454, 1985 

5 (143) Burton, Proc. Natl. Acad. Sci. 88, 1 1 120-1 1 123, 1991 

(144) Thirionetal.,Eur. J. Cancer Prev. 5, 507-11, 1996 

(145) Coloma & Morrison, Nat. Biotechnol. 15, 159-63, 1997 

(146) Mallender & Voss, J. Biol. Chem. 269, 199-206, 1994 

(147) Verhaar et al., Int. J. Cancer 61, 497-501, 1995 

10 (148) Nicholls et al., J. Immunol. Meth. 165, 81-91, 1993 

(149) Orlandi et al., Proc. Natl. Acad. Sci. 86, 3833-3837, 1989 

(150) Winter et al. Nature 349, 293-299, 1991 

(151) Lam, Anticancer Drug Des. 12, 145, 1997 

(1 52) DeWitt et al, Proc. Natl. Acad. Sci. U.S A. 90, 6909, 1 993 
15 (153) Erbetal. Proc. Natl. Acad. Sci. U.S A. 91, 11422, 1994 

(1 54) Zuckermann et al, J. Med. Chem. 37, 2678, 1994 

(155) Cho et al. Science 261, 1303, 1993 

(156) Carell et al, Angew. Chem. Int. Ed. Engl. 33, 2059 & 2061, 1994 

(157) Gallop et al, J. Med. Chem. 37, 1233, 1994 
20 (158) Houghten, BioTechniques 13, 412-421, 1992 

(159) Lam, Nature 354, 8284, 1991 

(1 60) Fodor, Nature 364, 555-556, 1 993 

(1 6 1) Cull et al, Proc. Natl. Acad. Sci. U.S.A. 89, 1 865-1 869, 1 992 

(162) Scott & Smith, Science 249, 386-390, 1990 
25 (163) Devlin, Science 249, 404-406, 1990 

(164) Cwirla et al, Proc. Natl. Acad. Sci. 97, 6378-6382, 1990 

(165) Felici, J. Mol. Biol. 222, 301-310, 1991 

(166) Jayawickreme et al, Proc. Natl. Acad. Sci. U.S.A. 19, 1614-1618, 1994 

(167) Chelsky, Strategies for Screening Combinatorial Libraries 1995 
30 (168) Salmon etal. Molecular Diversity 2, 57-63, 1996 

(169) McConnell et al. Science 257, 1906-1912, 1992 

(170) Sjolander & Urbaniczky, Anal. Chem. 63, 2338-2345, 1991 

(171) Szabo et al, Curr. Opin. Struct Biol. 5, 699-705, 1995 

(172) Zervos et al. Cell 72, 223-232, 1993 

35 (173) Madura et al, J. Biol. Chem. 268, 12046-12054, 1993 
(174) Bartel et al, BioTechniques 14, 920-924, 1993 



WO 2005/047534 , PC T/EP2004/0 11599 

-137- 

(175) Iwabuchi et al, Oncogene 8, 1693-1696, 1993 

(176) Findeis et al. Trends in Biotechnol. 11, 202-205, 1993 

(177) Chiou et al., Gene Therapeutics: Methods and Applications of Direct Gene 
TRANSFER J. A. Wolff, ed., 1994 

(178) Wu & Wu, J. Biol. Chem. 263, 621-24, 1988 

(179) Wu et al., J. Biol. Chem. 269, 542^6, 1994 

(180) Zenke et al., Proc. Natl. Acad. Sci. U.SA. 87, 3655-59, 1990 

(181) Wu et al., J. Biol. Chem. 266, 338-42, 1991 

(1 82) Remington's Pharmaceutical Sciences Maack Publishing Co., Easton, Pa. 

(183) Hille, Excitable Membranes, Sunderland, MA, Sinauer Associates, Inc. 



WO 2005/047534 



Table 1 



-138- 



PC17EP2004/011599 



DNA 
SEQ ID NO: 


Protein 
SEQ ID NO: 


GenbankID 


Unigene_vl62_ID 


Locus Link ID 


Gene Name 


1 


27 


NM_006148.1 


Hs.334851 


3927 


LASP1 


2 


28 


NMJ)00723.1 


Hs.635 


782 


CACNB1 


3 


29 


NM_000981.1 


Hs.381061 


6143 


RPL19 


4 


30 


Y13467 


Hs.15589 


5469 


PPARGBP 


5 


31 


NM_016507.1 


Hs.416108 


51755 


CrkRS/CRK7 


6 


32 


AB021742.1 


Hs:322431 


4761 


NEUROD2 


7 


33 


NM_006804.1 


Hs.77628 


10948 


MLN64/STARD3 


8 


34 


NM_003673.1 


Hs.343603 


8557 


TELETHONIN 


9 


35 


NM_002686.1 


Hs.1892 


5409 


PNMT 


10 


36 


X03363.1 


Hs.446352 


2064 


ERBB2 


11 


37 


AB008790.1 


Hs.86859 


2886 


GRB7 


12 


38 


NMJ)02809.1 


Hs.9736 


5709 


PSMD3 


13 


39 


NM_000759.1 


Hs.2233 


1440. 


GCSFG/CSF3 


14 


40 


AI023317 
NMJ)14815 


Hs.23106 


9862 


KIAA0130/ 
TRAP100 


15 


41 


X55005 


Hs.724 


7067 


c-erbA-1 
ATHRA 


16 


4 2 


X72631 


Hs.2769166 


9572 


NR1D1 


17 


43 


NM_007359.1 


Hs.83422 


22794 


MLN51 


18 


44 


U77949.1 


Hs.405958 


990 


CDC6 


[ 19 


45 


U41742.1 ; 
NM_000964 


Hs.361071 


5914 


RARA 


20 


46 


NM_001067.1 


Hs.156346 


7153 


TOP2A 


Z 1 


A7 
4/ 


NM_UU1 552.1 


Hs.1516 


3487 


IGFBP4 


22 


48 


NM_001 838.1 


Hs.1652 


1236 


CCR7 EBI1 


23 


49 


NMJ)03079.1 


Hs.437546 


6605 


SMARCE1 BAF57 


24 


50 


X14487 


Hs.99936 


3858 


KRT10 ! 


25 


51 


NM_000223.1 


Hs.66739 


3859 


KRT12 


26 


52 


NMJ)02279.2 


Hs.32950 


3884 


/KRTHA3B 


53 


76 


NM 005937 


Hs.497128 


4302 


MLLT6 
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DNA 
SEQ ID NO: 


Protein 
SEQ ID NO: 


GenbankID 


Unigene_vl62JD 


Locus Link ID 


Gene Name 


54 


77 


XM_008147/ 
NM 007144 


Hs.371617 


7703 


ZNF144/KNF110 


55 


78 


NM_138687 


Hs.9605 


8396 


PIP5K2B 


56 


79 


NM_020405 


Hs.125036 


57125 


TEM7/PLXDC1 


57 


80 


AF129512 


Hs.258579 


22806 


ZNFN1A3 


58 


81 


XM.085731 
NMJ33264 


Hs.421622 


147179 


WIRE 


59 


82 


NM_002795 


Hs.82793 


5691 


PSMB3 


60 


83 


NM 033419 


Hs.91668 


93210 


MGC9753 
Variant a 
/CAB2 


61 


84 


NM_033419 


Hs.91668 


93210 


MGC9753 
Variant c 


62 


85 


NM_033419 


Hs.91668 


93210 


MGC9753 
Variant d 


63 


86 


NM_033419 


Hs.91668 


93210 


MGC9753 
Variant e 


64 


87 


NM_033419 


Hs.91668 


93210 


MGC9753 
Variant g 


65 


88 


NM_033419 


Hs.91668 


93210 


MGC9753 
Variant h 


66 


89 


NMJ)33419 


Hs.91668 


93210 


MGC9753 
Variant i 


67 


90 


AF395708 


Hs.133167 


94103 


ORMDL3 


68 


91 


NM_032875 


Hs. 194498 


84961 


MGC15482 


69 


92 


NM_032192 


Hs.286192 


84152 


PPP1R1B 


70 


93 


NMJ)32339 


Hs.333526 


84299 


MGC14832 


71 


94 


NM_057555 
NMJ39280 


Hs.133167 


51242 


LOC51242 
/ORMDL3 


72 


95 


NM.017748 


Hs.406223 


54883 


FU20291 


73 


96 


NM_018530 


Hs.306777 


55876 


Pro2521 


74 


97 


NM 016339 


Hs.158530 


51195 


Link-GEFH 


75 


98 


NMJ)32865 


Hs.99037 


84951 


CTEN 
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DNA 
SEQ ID NO: 


Protein 
SEQ ID NO: 


GenbankID 


Unigene_vl 62_ID 


Locus Link ID 


Gene Name 


315 


393 


XM_294897 


Hs.270564 


30837 


NAP4 


316 


394 


NM 032351 


Hs. 19347 


84311 


MRLP45 


317 


395 


NM_000458 


•Hs.408093 


6928 


TCF2 


318 


396 


NMJ52300 


Hs.380430 


11056 


ROK1 


319 


397 


NM019010 


Hs.84905 


54474 


KRT20 


320 


398 


NMJ73213 


Hs.9029 


25984 


KRT23 


321 


399 


NM 033185 


Hs.307025 


85293 


KRTAP3-3 


322 


400 


NM_031959 


Hs.307026 


83897 


KRTAP3-2 


323 


401 


NG_000941 




85345 


KRTAP3P1 


324 


402 


NM_031958 


Hs.307027 


83896 


KRTAP3-1 


325 


403 


NM_031957 


Hs.307030 


83895 


KRTAP1-5 


326 


404 


NMJ)30966 


Hs.247935 


81850 


KRTAP1-3 


327 


405 


NMJ)30967 


Hs.247934 


81851 


KRTAP1-1 


328 


406 


AJ302536 




85296 


KRTAP2-2 


329 


407 


NM_033184 




85294 


KRTAP2-4 


330 


408 


NG_000939 




85343 


KRTAP2P1 


331 


409 


NM 033061 


Hs.380164 


85287 


KRTAP4-7 


332 


410 


NM_033059 


Hs.307015 


85282 


KRTAP4-14 


333 


411 


NM_031854 


Hs.307016 


83755 


KRTAP4-12 


334 


412 


NM_033188 


Hs307016 


83755 


KRTAP4-5 


335 


413 


NM_033186 




85283 


KRTAP4-13 


336 


414 


NM 032524 


Hs.307022 


84616 


KRTAP4-4 


337 


415 


NMJ)33062 


Hs.380165 


85291 


KRTAP4-2 


338 


416 - 


NM_033060 


Hs.380165 


85291 


KRTAP4-10 


339 


417 


NM 031961 


Hs.307013 ; 


83899 


KRTAP9-2 


340 


418 


NM_031962 


Hs.307012 


83900 


KRTAP9-3 


341 


419 


NMJ)31963 


Hs.307011 


83901 


KRTAP9-8 


342 


420 


NM 030975 


Hs.307010 


81870 


KRTAP9-9 


343 


421 


NM 033191 




85280 


KRTAP9-4 


344 


422 


NG_000942 




85347 


KRTAP9P1 


345 


423 


XM_210345 


Hs.463016 


85276 


KRTAP16-1 


346 


424 


NM_031964 


Hs.307009 , 


83902 


KRTAP17-1 


347 


425 


NM 004138 


Hs.197874 


3883 


KRTHA3A 


348 


426 


NM 002279 


Hs.32950 


3884 


KRTHA3B 
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DNA 
SEQIDNO: 


Protein 
SEQIDNO: 


GenbankID 


Unigene_vl62JGD 


Locus Link ID 


Gene Name 


349 


427 


NMJ)21013 


Hs.296942 


3885 


KRTHA4 


350 


428 


NM_002277 


Hs.41696 


3881 


KRTHA1 


351 


429 


Y16795 




8686 


KRTHAP1 


352 


430 


NM_003770 


Hs. 159403 


8688 


KRTHA7 


353 


431 


NM_006771 


Hs.248188 


8687 


KRTHA8 


354 


432 


NMJ)02278 


Hs.41752 


3882 


KRTHA2 


355 


433 


NM_002280 


Hs.73082 


3886 


KRTHA5 


356 


434 


NMJ)03771 


Hs.248189 


8689 


KRTHA6 


357 


435 


NM_002274 


Hs.433871 


3860 


KRT13 


358 


436 


NMJ)02275 


Hs.80342 


3866 


KRT15 


359 


437 


NM_002276 


Hs309517 


3880 


KRT19 


360 


438 


NM_000226 


Hs.2783 


3857 


KRT9 


361 


439 


NM 000526 


Hs.355214 


3861 


KRT14 


362 


440 


NM 005557 


Hs.432448 


3868 


KRT16 


363 


441 


NM_000422 


Hs.2785 


. 3872 


KRT17 


364 


442 


NM_005556 


Hs.23881 


3855 


KRT7 


365 


443 


NG_000944 




85349 


KRTHBP4 


366 


444 


NG_000943 




85348 


KRTHBP3 


367 


445 


NM_002281 


Hs.170925 


3887 


KRTHB1 


368 


446 


NM_002284 


Hs:278658 


3892 


KRTHB6 


369 


447 


NM_002282 


Hs.182506 


3889 


KRTHB3 


370 


448 


NG_000940 




85344 


KRTHBP2 1 


371 


449 


NM_002283 


Hs. 182507 


3891 


KRTHB5 


372 


450 


NM 033045 


Hs.272336 


3890 


KRTHB4 


373 


451 


NMJ)33033 


Hs.134640 


3888 


KRTHB2 


374 


452 


Y19213 




85340 


KRTHBP1 


375 


453 


NM_005555 


Hs.432677 


3854 


KRT6B 


376 


454 


NMJ73086 


Hs.446417 


286887 


KRT6E 


377 


455 


NM_058242 




140446 


KRT6C 


378 


456 


NM_005554 


Hs.367762 


3853 


KRT6A 


379 


457 


NM_000424 


Hs.433845 


3852 


KRT5 


380 


458 


NM 033448 


Hs.55278 


112802 


KRT6IRS 


381 


459 


NMJ75053 


Hs.56255 


121391 


KRT6IRS4 


382 


460 


NM_080747/ 


Hs.147040 


140807 


K6IRS2/ 
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DNA 
SEQIDNO: 


Protein 
SEQ ID NO: 


Genbank ID 


Unigene_vl62_ID 


Locus Link ID 


Gene Name 






AY033495 






KRT6 


383 


461 


NMJ75068 


Hs.3 19101 


55410 


KRT6IRS3 


384 


462 


NMJ)00423 


Hs.707 


3849 


KRT2A 


385 


463 


NM_006121 


Hs.80828 


3848 


KRT1 


386 


464 


NM_057088 


Hs.410397 


3850 


KRT3 


387 


465 


NM_002272 


Hs371139 


3851 


KRT4 


388 


466 


NM_002273 


Hs.356123 


3856 


KRT8 


389 


467 


NM_000224 


Hs.406013 


3875 


KRT18 


390 


468 


NM_032950 


Hs.380710 


79148 


MMP28 


391 


469 


NM_005419 


Hs.72988 


6773 


STAT2 


392 


470 


NM 002046 


Hs. 169476 


2597 


GAPDH 
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Table 2 



DNA 
SEQ ID NO: 


Gene description 


1 


Member of a subfamily of LIM proteins that contains a LIM domain and an 
SH3 (Src homology region 3) domain 


2 


Beta 1 subunit of a voltage-dependent calcium channel (dihydropyridine 
receptor), involved in coupling of excitation and contraction in muscle, also 
acts as a calcium channel in various other tissues 


3 


Ribosomal protein L19, component of the large 60S ribosomal subunit 


4 


Protein with similarity to nuclear receptor-interacting proteins; binds and co- 
activates the nuclear receptors PPARalpha (PPARA), RARalpha (RARA), 
RXR, TRbetal, and VDR 


5 


we26e02.xl CDC2-related protein kinase 7 


6 


Neurogenic differentiation, a basic-helix-loop-helix transcription factor that 
mediates neuronal differentiation 


7 


Protein that is overexpressed in malignant tissues, contains a putative trans- 
membrane region and a StAR Homology Domain (SHD), may function in 
steroidogenesis and contribute to tumor progression 


8 


Telethonin, a sarcomeric protein specifically expressed in skeletal and heart 
muscle, caps titin (TTN) and is important for structural integrity of the 
sarcomere 


9 


Phenylethanolamine N-methyltransferase, acts in catecholamine biosynthesis 
to convert norepinephrine to epinephrine 


10 


Tyrosine kinase receptor that has similarity to the EGF receptor, a critical 
component of IL-6 signaling through the MAP kinase pathway, overexpression 
associated with prostate, ovary and breast cancer 


11 


Growth factor receptor-bound protein, an SH2 domain-containing protein that 
has isoforms which may have a role in cell invasion and metastatic progression 
of esophageal carcinomas 


! 12 


Non-ATPase subunit of the 26S proteasome (prosome, macropain) 


13 


Granulocyte colony stimulating factor, a glycoprotein that regulates growth, 
differentiation, and survival of neutrophilic granulocytes 
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DNA 
bEy} ID NO: 


Gene description 


14 


Member of the Vitamin D Receptor Interacting Protein co-activator complex, 
has strong similarity to thyroid hormone receptor-associated protein (murine 
Trap 100) which function as a transcriptional coregulator 


15 


Thyroid hormone receptor alpha, a high affinity receptor for thyroid hormone 
that activates transcription; homologous to avian erythroblastic leukemia virus 
oncogene 


10 


encoding Rev-Erb Aalp nuclear receptor subfamily 1 , group D, member 1 


17 


Protein that is overexpressed in breast carcinomas 


1 0 


Protein which interacts with the DNA replication proteins PCNA and Orel, 
translocates from the nucleus following onset of S phase; S. cerevisiae 
homolog Cdc6p is required for initiation of S phase 


1 o 


Retinoic acid receptor alpha, binds retinoic acid and stimulates transcription in 
a ligand-dependent manner 


20 


DNA topoisomerase II alpha, member of a family of proteins that relieves 
torsional stress created by DNA replication, transcription, and cell division; 


21 


Insulin-like growth factor binding protein, the major IGFBP of osteoblast-like 
cells, binds lur 1 and IGF 2 and inhibits their effects on promoting DNA and 
glycogen synthesis in osteoblastic cells 


22 


HUMEBI103 G protein-coupled receptor (EBI 1) gene exon 3 chemokine (C-C 
motif) receptor 7 G protein-coupled receptor 


23 


Protein with an HMG 1/2 DNA-binding domain that is subunit of the 
MMWaWl complex associated with the nuclear matrix and implicated in 
regulation of transcription by affecting chromatin structure 


24 


Keratin 10, a type I keratin that is a component of intermediate filaments and is 
cAprc&deu. m leixnuiaijy aiiierenuaiea epiuermai cens, muianon 01 me 
corresponding gene causes epidermolytic hyperkeratosis 


25 


Keratin 12, a component of intermediate filaments in corneal epithelial cells; 
mutation of the corresponding gene causes Meesmann corneal dystrophy 


26 


Hair keratin 3B, a type I keratin that is a member of a family of structural 
proteins that form intermediate filaments 


53 


MLLT6 Myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, 
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DNA 
SEQ ID NO: 


Gene description 




Drosophila); translocated to, 6 


54 


zinc finder nrotein 1 44 fMel-1 8^ 


55 


PhosT3hatidvlinositol-4-rjhosohate 5-kinase tvoe H beta isoform a 


56 


tumor endothelial marker 7 nrecursor 


57 


zinc finffer nrotein subfamilv 1 A 3 


58 


WASP-bindinff nrotein mutative crl6 and wiD like nrotein similar to Wiskott- 
Aldrich syndrome protein 


59 


Protea^ome fnro^ome macronain^ ^uhiinit- beta tvne 3 


60 


Predicted 


67 


(TRIMI-like 3 (S cerevisiae^ 


68 


F-box domain A Recentor for IJbiouitination Tar pets 


69 


nrotein nbosnhatase 1 reeulatorv (inhibitor) subunit IB rdonamine and cAMP 
regulated phosphoprotein, DARPP-32) 


70 


Predicted Protein 


71 


Predicted Protein 


72 


Predicted Protein 
* 


73 


Predicted Protein 

X 1 vVllV IrwVX X A \s IV/ XX A 


74 


Link-GEFH: Link guanine nucleotide exchange factor II 


75 


("^-terminal ten sin -like 

V^ VV^X 11X11 A £11 iV/XXOlli llxvw 


315 


ITomo s aniens Nek Ash and nhosnholinase CI hindino" nrotein fNAP*l^ 

x ±.\juh.\j ocxiyxv^xxo i^wiv, ixoxi oiiu lyixviij i^xxv/xi L/cio w sJHi\ i nig uiui>vlii lli/u 


316 


Homo sapiens mitochondrial ribosomal protein L45 (MRPL45), nuclear gene 

en co din 0" mi tochon dri a 1 nrotein 

wllvUUXllg XXXX LV/l/XiUAXUx 1Q.1 VJX \J I w IXX 


317 


Homo sapiens transcription factor 2, hepatic; LF-B3; variant hepatic nuclear 

factor CTTT*')} iTanQprfrit variant a 

xcxvlUl ^1U1 *Vi UCUxOOl 1LIL V tlx I ull I a 


318 


Homo sapiens DEAD (Asp-Glu-Ala-Asp) box polypeptide 52 (DDX52) 


j Ly 


riuiiiu octpicuo xvcraLin xu ^jnjvizv^, ,i5> a LuiiipuncnL ui in ici jliicuj die iiiameiiL 

network 


320 


Homo sapiens keratin 23 (histone deacetylase inducible) (KRT23), is a 
component of intermecliate filament network transcript variant 2 


321 


Homo sapiens keratin associated protein 3-3 (KRTAP3-3) ), is a component of 
intermediate filament network 


322 


Homo sapiens keratin associated protein 3-2 (KRTAP3-2) ,is a component of 
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DNA 
SEQ ID NO: 


Gene description 




intermediate filament network 


323 


Homo sapiens keratin associated protein 3 pseudogene 1 (KRTAP3P1) on 
chromosome 17 ,is a component of intermediate filament network 


324 


Homo sapiens keratin associated protein 3-1 (KRTAP3-1) a is a component of 
intermediate filament network 


325 


Homo sapiens keratin associated protein 1-5 (KRTAP1-5) ,is a component of 
intermediate filament network 


326 


Homo sapiens keratin associated protein 1-3 (KRTAP1-3) ,is a component of 
intermediate filament network 


327 


Homo sapiens keratin associated protein 1-1 (KRTAP1-1) ,is a component of 
intermediate filament network 


328 


HSA302536 Homo sapiens partial mRNA for keratin associated protein 
KAP2.2 (KRTAP2.2 gene) ,is a component of intermediate filament network 


329 


Homo sapiens keratin associated protein 2-4 (KRTAP2-4) ,is a component of 
intermediate filament network 


330 


Homo sapiens keratin associated protein 2 pseudogene 1 (KRTAP2P1) on 
chromosome 17,is a component of intermediate filament network 


331 


Homo sapiens keratin associated protein 4-7 (KRTAP4-7) ,is a component of 
intermediate filament network 


332 


Homo sapiens keratin associated protein 4-14 (KRTAP4-14) ,is a component 
of intermediate filament network 


333 


Homo sapiens keratin associated protein 4-12 (KRTAP4-12) ,is a component 
of intermediate filament network 


334 


Homo sapiens keratin associated protein 4-5 (KRTAP4-5) ,is a component of 
intermediate filament network 


335 


3omo sapiens keratin associated protein 4-13 (KRTAP4-13) ,is a component 
of intermediate filament network 


336 


Homo sapiens keratin associated protein 4-4 (KRTAP4-4) ,is a component of 
intermediate filament network 


337 


Homo sapiens keratin associated protein 4-2 (KRTAP4-2) ,is a component of 
intermediate filament network 


338 


Homo sapiens keratin associated protein 4-10 (KRTAP4-10) ,is a component 
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DNA 
SEQ ID NO: 


Gene description 




of intermediate filament network 


339 


Homo sapiens keratin associated protein 9-2 (KRTAP9-2) ,is a component of 
intermediate filament network 


340 


Homo sapiens keratin associated protein 9-3 (KRTAP9-3) ,is a component of 
intermediate filament network 


341 


Homo sapiens keratin associated protein 9-8 (KRTAP9-8) ,is a component of 
intermediate filament network 


342 


Homo sapiens keratin associated protein 9-9 (KRTAP9-9) ,is a component of 
intermediate filament network 


343 


Homo sapiens keratin associated protein 9A (KRTAP9-4) ,is a component of 
intermediate filament network 


344 


Homo sapiens keratin associated protein 9 pseudogene 1 (KRTAP9P1) on 
chromosome 17 ,is a component of intermediate filament network 


345 


Homo sapiens keratin associated protein 16-1 (KRTAP16-1) ,is a component 
of intermediate filament network 


346 


Homo sapiens keratin associated protein 17-1 (KRTAP17-1) ,is a component 
of intermediate filament network 


347 


Homo sapiens keratin, hair, acidic, 3A (KRTHA3A) ,is a component of 
intermediate filament network 


348 


Homo sapiens keratin, hair, acidic, 3B (KRTHA3B) ,is a component of 
intermediate filament network 


349 j 


Homo sapiens keratin, hair, acidic, 4 (KRTHA4) ,is a component of 
intermediate filament network 


350 


Homo sapiens keratin, hair, acidic, 1 (KRTHA1) ,is a component of 
intermediate filament network 


351 


HSA16795 Homo sapiens KRTHAP1 pseudogene,is a component of 
intermediate filament network 


352 


Homo sapiens keratin, hair, acidic, 7 (KRTHA7) ,is a component of 
intermediate filament network 


353 


Homo sapiens keratin, hair, acidic, 8 (KRTHA8) ,is a component of 
intermediate filament network 


354 


Homo sapiens keratin, hair, acidic, 2 (KRTHA2) ,is a component of 
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DNA 


Gene description 




intermediate filament network 


IKK 


riomo Sapiens Keraun, nair, acidic, o \jsj\irij\D) ,is a component ox 
intermediate filament network 


J DO 


Homo sapiens keratin, hair, acidic, 6 (KRTHA6) ,is a component of 
intermediate filament network 


357 
JJ/ 


Homo sapiens keratin 13 (KRT13), transcript variant 2,is a component of 
intermediate filament network 


OCR 


riomo sapiens Keratin id {Jsj^iidj ,is a component 01 intermediate 11 lament 
network 




riomo sapiens Keratin vjsjvi is// ,is a component 01 intermediate n lament 
network 


jOv 


riomo sapiens Keraun ;/ ^epiaermoiync paimopiantar Keratooermaj [r%J\ i y) ,is 
a component of intermediate filament network 


OOi 


xiomo sapiens Keratin i*f ^epidermolysis Duiiosa simplex, uowiing-JVleara, 
Koebner) (KRT14) ,is a component of intermediate filament network 




nojnu sapiens Keraun id ^iocai non-epiQermoiytic paimopiantar Keratoderma) 
(KRT16) ,is a component of intermediate filament network 


jOj 


riomo sapiens Keraun i / jjsjk. i 1 / ) ,is a component 01 mtermediate 11 lament 
network 


jOH 


xiomo sapiens Keraun / (isjtvi /) ,is a component 01 intermediate mament 
network 


jOj 


xiomo sapiens psinxiDu nair Keraun pseuaogene ^jvj\1xijdj^4^ on enromosome 
12,is a component of intermediate filament network 


J 00 


nomo sapiens psinriDu nair Keratin pseuaogene ^JsJviiiiJr j) on enromosome 
I2,is a component of intermediate filament network 


jO/ 


riomo sapiens Keratin, nair, basic, 1 ^rvKlilrilj ,is a component ot 
intermediate filament network 


368 


Homo sapiens keratin, hair, basic, 6 (monilethrix) (KRTHB6) ,is a component 
of intermediate filament network 


369 


Homo sapiens keratin, hair, basic, 3 (KRTHB3) ,is a component of 
intermediate filament network 


370 


Homo sapiens psihHbB hair keratin pseudogene (KRTHBP2) on chromosome 
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DNA 


Gene description 




12 ,is a component of intermediate filament network 


lit 

ill 


xiomo sapiens Keraun, nair, oasic, j ^jnjvixxdj j ,is a componenx oi 
intermediate filament network 


j 1 L 


xiomo Sapiens Keratin, nair, oasic, h ^ivxv±xix>hj, ,15 a component oi 
intermediate filament network 


J /J 


xiomo .sapiens Keratin, nair, oasic, l ^jsjvixidzj ,is a component oi 
intermediate filament network 


in A 


HSPSIHHBA Homo sapiens putative psihHbA pseudogene for hair keratin, 
exons 2 to 7 


51 j 


Homo sapiens keratin 6B (KRT6B) ,is a component of intermediate filament 
network 


0 /0 


Homo sapiens keratin 6E (KRT6E),is a component of intermediate filament 
network 


377 


Homo sapiens keratin 6C (KRT6C) ,is a component of intermediate filament 
network 


378 


Homo sapiens keratin 6A (KRT6A), ,is a component of intermediate filament 
network 


379 


Homo sapiens keratin 5 (epidermolysis bullosa simplex, Dowling- 
ivieara/xvODner/ w eoer-^ocKayne rypes j ^jsjv ij) 5 is a componenx oi 
intermediate filament network 


jou 


xiomo sapiens Keraun o its \ivJcv 1 ojlko ) ,is a componenx oi mxermeQiaie 
filament network 


381 
051 


xiomo sapiens Keraxin o its^ (jvoixvdh/ ,is a component oi lniermecnaie 
filament network 




xiomo sapiens Keraun proiein Jvoirs (j^ojlkox ) ,is a component oi intermeaiate 
filament network 




XlUillU odjJlClli) i^Ci allll pJUlvUl JVOllS \ JSSjxSSS^jL) ,15 a UUllipOIldll Ul 111 LCI lilCUlalC 

filament network 


384 


Homo sapiens keratin 2A (epidermal ichthyosis bullosa of Siemens) (KRT2A) 
,is a component of intermediate filament network 


385 


Homo sapiens keratin 1 (epidermolytic hyperkeratosis) (KRT1) ,is a 
component of intermediate filament network 
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Table 2 (continued) 



TWA 

SEQ ID NO: 


Gene description 




numo bdpiens Kcraum j \jsjxxjj 9 io a conjpuacni oi m ici meuiaie mameni 
network 


JO/ 


xiomo sapiens Keratin *f ^rwivi*fj ,is a component oi iniernieuiate n lament 


388 


Homo sapiens keratin 8 (KRT8) ,is a component of intermediate filament 

n ptwnrV 


389 


Homo sapiens keratin 18 (KRT18) ,is a component of intermediate filament 
network 


390 


Homo sapiens matrix metalloproteinase 28 (MMP28), transcript variant 2 


391 


Homo sapiens signal transducer and activator of transcription 2, 113kDa 
(STAT2) 


392 


Homo sapiens glyceraldehyde-3-phosphate dehydrogenase (GAPD) 
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co 



nation 


























Uular space 














JO 




membra 


oplasm 


lcleus 


i 


i 


oplasm 


oplasm 




membra 


oplasm 


oplasm 


• 


lcleus 


acleus 


1 


icleus 


icleus 


Subcellula 




Plasma 


Cyt 


% 






Cyt 


Cyt 




Plasma 


& 


Cyt 


Extrace 




Z 


% 






0 






































P 








































3)Kactiv 














































jporter] 








CO 
• rH 
CO 

S 
























lation of C 








5 tran* 








I 
c 
























i 

8 








)AISS 


.52 












Neu/ErbB-2 receptor receptor signaling protein tyrosine kinase 
















.5 
o 


ctor 


Gene function 


SH3/SH2 adapter protein 


voltage-gated calcium channel membrane fraction Channel [pa 


RNA binding structural protein of ribosome protein biosynthes 


transcription co-activator nucleus Pol II transcription 


i 


transcription factor transcription regulation from Pol II promot 


mitochondrial transport steroid and lipid metabolism 


structural protein of muscle sarcomere alignment 


phenylethanolamine N-methyltransferase Transferase 


SH3/SH2 adapter protein EGF receptor signaling pathway 


26S proteasome Protein degradation Proteasome subunit 


developmental processes positive control of cell proliferation 


fatty acid omega-hydroxylase fatty acid omega-hydroxylase 


DNA-binding protein Transcription factor 


steroid hormone receptor transcription co-repressor 


t 


nucleotide binding cell cycle regulator DNA replication checlq 


retinoic acid receptor transcription co-activator transcription fa 


DNA 
SEQ ID NO: 






CO 




m 






00 


On 


o 






CO 




m 


vo 




oo 


ON 
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Subcellular localization 


nucleus 




plasma membrane 


nucleus nuclear chromosome 


cytoplasm 


cytoplasm 


{ 


1 




i 




■ 


■ 




i 


i 


1 


1 


Gene function 


DNA binding DNA topoisomerase (ATP-hydrolyzing) 


skeletal development DNA metabolism signal transduction cell proliferation 




chromatin binding transcription co-activator nucleosome disassembly transcription 


Cell structure Cytoskeletal Epidermal Development and Maintenance 


structural protein vision cell shape and cell size control intermediate filament 


cell shape and cell size control Cell structure 






leucine-zipper containing fusion 




Tumor endothelial marker 7 precursor; may be involved in angiogenesis 


Aiolos; DNA binding protein that may be a transcription factor; has strong similarity to murine 
Znfhla3, contains zinc finger domain 


The WASP-binding protein WIRE has a role in the regulation of the actin filament system downstream 
of the platelet-derived growth factor receptor 










DNA 
SEQIDNO: 


o 

CN 


CN 


CN 
CN 


?3 


3 




vo 
CN 


CO 




2 


in 
in 


vo 
in 


»n 


oo 
in 


OS 

in 


o 
vo 


T— < 

VO 


vo 
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o 

I a 
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i .a 



o 



en 



6 

1 



CO 



00 

t-H 

CO 



1 

I 

o 
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J 

O 
P- 



I 
1 



1 



o 

I 

p. 

I 
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ill ular local iz 




Cytoplasm 


Cytoplasm 


cytoplasm 


cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


Subci 










































Gene function 


KRTAP3-2, integral part of the intermediate filamentous network 
KRTAP3PL integral Dart of the intermediate filamentnns netwnrlr 


KRTAP3-1, integral part of the intermediate filamentous network 


KRTAPl-5, integral part of the intermediate filamentous network 


KRTAP1-3, integral part of the intermediate filamentous network 


KRTAPl-1, integral part of the intermediate filamentous network 


KRTAP2-2, integral part of the intermediate filamentous network 


KRTAP2-4, integral part of the intermediate filamentous network 


KRTAP2P 1 , integral part of the intermediate filamentous network 


KRTAP4-7, integral part of the intermediate filamentous network 


KRTAP4-14, integral part of the intermediate filamentous network 


KRTAP4-12, integral part of the intermediate filamentous network 


KRTAP4-5, integral part of the intermediate filamentous network 


KRTAP4-13, integral part of the intermediate filamentous network 


KRTAP4-4, integral part of the intermediate filamentous network 


KRTAP4-2, integral part of the intermediate filamentous network 


KRTAP4-10, integral part of the intermediate filamentous network 


KRTAP9-2, integral part of the intermediate filamentous network 


KRTAP9-3, integral part of the intermediate filamentous network 


KRTAP9-8, integral part of the intermediate filamentous network 


DNA 


SEQIDNO: 


CN cn 
CN CN 
cn cn 


cn 


m 

CN 

cn 


vo 

CN 

cn 


CN 
cn 


00 
CN 
cn 


ON 
CN 

cn 


o 

cn 
cn 


i—i 
cn 
cn 


CN 

cn 
cn 


cn 
cn 
cn 


cn 
cn 


*n 
cn 
cn 


vo 
cn 
cn 


cn 
cn 


oo 
cn 
cn 


On 
cn 
cn 


o 
cn 


»— < 
cn 
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ition 










































1 


cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


! 

o 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Subc< 










































Gene function 


KRTAP9-9, integral part of the intermediate filamentous network 


KRTAP9-4, integral part of the intermediate filamentous network 


KRTAP9P1, integral part of the intermediate filamentous network 


KRTAP16-1, integral part of the intermediate filamentous network 


KRTAP17-1, integral part of the intermediate filamentous network 


KRTHA3 A, integral part of the intermediate filamentous network 


KRTHA3B, integral part of the intermediate filamentous network 


KRTHA4, integral part of the intermediate filamentous network 


KRTHAl, integral part of the intermediate filamentous network 


KRTHAPl, integral part of the intermediate filamentous network 


KRTHA7, integral part of the intermediate filamentous network 


KRTHA8, integral part of the intermediate filamentous network 


KRTHA2, integral part of the intermediate filamentous network 


KRTHA5, integral part of the intermediate filamentous network 


KRTHA6, integral part of the intermediate filamentous network 


KRT13, integral part of the intermediate filamentous network 


KRT15, integral part of the intermediate filamentous network 


KRT19, integral part of the intermediate filamentous network 


KRT9, integral part of the intermediate filamentous network 


KRT14, integral part of the intermediate filamentous network 


DNA 
SEQIDNO: 


m 


m 
co 


co 


m 
^. 
to 


VO 
CO 


CO 


oo 

CO 


o\ 

CO 


o 

m 

CO 


vo 

CO 


CS 
VO 
CO 


CO 

vo 

CO 


CO 


vo 
vn 

CO 


vo 
vn 

CO 


r> 
vo 

CO 


oo 
vo 

CO 


o\ 
vn 

CO 


o 

VO 
CO 


»— i 
vo 

CO 
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sllular localb 




Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


cytoplasm 


Cytoplasm 


1 

I 


Cytoplasm 


Cytoplasm 


Cytoplasm 


a 
J 

"a. 


cytoplasm 


cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


Subc< 












































Gene function 


KRT16, integral part of the intermediate filamentous network 


KRT17, integral part of the intermediate filamentous network 


KRT7, integral part of the intermediate filamentous network 


KRTHBP4, integral part of the intermediate filamentous network 


KRTHBP3, integral part of the intermediate filamentous network 


KRTHBl, integral part of the intermediate filamentous network 


KRTHB6, integral part of the intermediate filamentous network 


IKRTHB3, integral part of the intermediate filamentous network 


KRTHBP2, integral part of the intermediate filamentous network 


KRTHB5, integral part of the intermediate filamentous network 


KRTHB4, integral part of the intermediate filamentous network 


KRTHB2, integral part of the intermediate filamentous network 


KRTHBPl, integral part of the intermediate filamentous network 


IKRT6B, integral part of the intermediate filamentous network 


KRT6E, integral part of the intermediate filamentous network 


KRT6C, integral part of the intermediate filamentous network 


KRT6A, integral part of the intermediate filamentous network 


KRT5, integral part of the intermediate filamentous network 


KRT6IRS, integral part of the intermediate filamentous network 


KRT6IRS4, integral part of the intermediate filamentous network 


DNA 


SEQIDNO: 


CS 
VO 

co 


co 
vo 
m 


3 

ro 


wo 
vo 
co 


VO 

vo 

ro 


vo 

CO 


00 
vo 

CO 


ON 

vo 
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r- 

CO 


1—4 
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CO 
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00 
CO 
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CO 
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00 
CO 


00 
CO 
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1 




















Subcellular localiz 




Cytoplasm 


Cytoplasm 


Cytoplasm 


[ Cytoplasm 


Cytoplasm 


Cytoplasm 


cytoplasm 


Cytoplasm 


Gene function 


KRT6, integral part of the intermediate filamentous network 


KRT6IRS3, integral part of the intermediate filamentous network 


KRT2A, integral part of the intermediate filamentous network 


KRTl, integral part of the intermediate filamentous network 


KRT3, integral part of the intermediate filamentous network 


KRT4, integral part of the intermediate filamentous network 


KRT8, integral part of the intermediate filamentous network 


KRTl 8, integral part of the intermediate filamentous network 


DNA 


6 

2 

a 

n 


CM 
oo 

CO 


CO 
00 
CO 


oo 

CO 


tn 

00 
CO 


vo 
oo 

CO 


r- 
oo 

CO 


oo 
oo 

CO 


a\ 

00 
CO 
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Table4 



DNA 
SEQID 
NO: 


Protein 
SEQID 
NO: 


Gene Name 


DBSNPID 


Type 


Codon 


AA-Seq 


9 


34 


ERBB2 




onniti ir.cvnnn 
l*UUliig-ojrAlUll 






9 


34 


ERBB2 


1 0£.^-J\J 1 \J\J 








9 


34 


ERBB2 


rsl058808 


coding- 

llUIloyilV/Il 


CCC|GCC 


P|A 


9 


34 


ERBB2 




n nti f* f\n in ct 
UUJIi/UUAllg 






9 


34 


ERBB2 




nunwuuiiig 






9 


34 


ERBB2 


rs2313170 


noncoding 






Q 

y 




FT?TVR9 


rcl 116901 


cooing- 
nunsynon 




T!V 
1|V 


Q 

y 


14 


FRBB9 

JuJCyJDJDZ- 


r<:9Q149fiR 


noncoaing 






Q 






rc9 179896 


noncoaing 






9 


34 


ERBB2 


rsl810132 


coding- 
nonsynon 


ATC|GTC 


I|V 


g 

y 




FPRR9 


rc1R01901 


noncoaing 






14 




c-prVi A - 1 


r<?9?10709 


LfOuing-oynon 




ojo 


14 


19 


l«/ CI L//\" 1 


rc99!0701 


c oaing-synon 




A 1 A 


14 


19 


p-f^rh A-1 


r e1 196^01 

Ibl J.X.U Jl/J 


ouuing- 
nnriQvnnTi 

UUilOjrilUll 


A COl A HP 


TIQ 


14 


39 


c-erbA-1 


rs3471 


, n^'nc^f1i'np , 

A1AA1 J. wv/ U 1 1A g 






19 


44 


TOP2A 


rs 13695 


tioncoHiTiP' ! 






19 


44 


TOP2A 


rs471692 


noncoding 






19 


44 


TOP2A 


lajjovuo 


t» ATI prirlin o 






19 


44 


TOP2A 


lot V/V/ fc fZ.OO 


HUH vUuli 1 g 






19 


44 


TOP? A 


rd 061 6Q9 
Jo l uO 1 O^Z 


coamg-synon 


OvJ/VjOVJLJ 




19 


44 


TOP2A 


rs520630 


noncoding 






19 


44 


TOP2A 


rs782774 


nonsynon 


AATIATTIA 
TT|TTT 


niiiif 


19 


44 


TOP2A 


rs565121 


noncoding 






19 


44 


TOP2A 


rs2586112 


noncoding 






19 


44 


TOP2A 


rs532299 


coding- 
nonsynon 


TTT|GTT 


F|V 
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DNA 
SEQID 
NO: 


Protein 
SEQID 
NO: 


Gene Name 


DBSNPID 


Type 


Codon 


AA-Seq 


19 


44 


TOP2A 


rs2732786 


noncoding 






19 


44 


TOP2A 


rsl 804539 


noncoding 






19 


44 


TOP2A 


rsl804538 


noncoding 






19 


44 


TOP2A 


rsl 804537 


noncoding 






19 


44 


TOP2A 


rsl 141364 


coding-synon 


AAAJAAG 


K|K 


23 


48 


KRT10 


rsl2231 


noncoding 






23 


48 


KRT10 


rsl 132259 


coding-nonsynon 


CAT|CGT 


H|R 


23 


48 


KRT10 


rsl 132257 


coding-synon 


CTG|TTG 


L|L 


23 


48 


KRT10 


rsl 132256 


coding-synon 


GCC|GCT 


A|A 


23 


48 


KRT10 


rsl 132255 


coding-synon 


CTG|TTG 


L|L 


. 23 


48 


KRT10 


rsl 132254 


coding-synon 


GGC|GGT 


G|G 


23 


48 


KRT10 


rsl 132252 


coding-synon 


TTC|TTT 


F|F 


23 


48 


KRT10 


rsl 132268 


coding-nonsynon 


CAG|GAG 


Q|E 


23 


48 


KRT10 


rsl 132258 


coding-nonsynon 


CGG|TGG 


R|W 
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a 


Z24029 


1 


G05498 X53777 


Z51080 


■ 


Z52895 


L29873 


G07286Z39013 


1 L29870 


1 


G15195 


1 


Z52854 


G07073 X03438 


l 


Gil 172X55068 


G14779 T50487 


G11580 T50487 


Z51301 


Z52160 


Z53182 


Z52130 


G06069 


G15440 


G11900 


G13994 


06909X 


Z53675 


1 


|PCR size (bp) 


128-142 


cn 


171-318 


71-103 


1 — 1 
in 


119-151 


VO 
CI 

t— < 


151-152 


VO 

cn 


0 

<N 


OO 

m 


1— < 


150-166 


102-103 


»n 
i— < 


o\ 

OV 




VO 
CM 
»— • 


-4- 
ov 


156-168 


167-173 


239-251 


O 

cn 


r- 
cn 


0 
00 


OO 


0 
0 

t 

1 

00 
00 


237-253 


OO 

m 


reverse 


TGCCGTGCCAGAGAGA 


GCCCAGCCTGTCACTTATTC 


ICAGCATTGGATGCAATCC 


AGGACAGTGTGTAGCCCTTC 


GCTGGCCATATATATATTTAAACC 


TGCCTACTGGAAACCAGA 


NGGAGGTTGCAGTGAGCCAAGAT 


TTGTTTCCCITTGACTTTCTGA 


IGTCTGGGTCTTTATGGNGCTTGTG 


GCATACAGCACCCTCTACCT 


CAGGAGTGAGACACTCTCCATG 


GCGTGTCTGTCTCCATGTGTGC 


CTGGAGGTTGGCTTGTGGAT 


CCACCCAGAAAAACAGGAGA 


TTGTCACCCCATTGCCTTTC 


CTTGTCCCTTCTCAATCCTCC 


GATTACAGTGCTCCCTCTCCC 


GATTACAGTGCTCCCTCTCCC 


TTTAGACTTGGTAACTGCCG 


GCCATGATCTCCCAAAGCC 


AAACTGTGTGTGTCAAAGGATACT 


TACATGAAGGCATGGTCTG 


GCTGCGGACCAGACAGAT 


CCCACGGCTTTCTTGATCTA 


CACTCAACTCAACAGTCTAAAGGTG 


ATTCAGCCTCAGTTCACTGCTTC 


TAGTCTCTGGGACACCCAGA 


ACTGACTGCGCCACTGC 


§ 

! 


forward 


ACAGTCTATCAAGCAGAAAAATCCT 


GACAACAGAGCGAGACTCCC 


TGGTCATTCGACAACGAA 


CTCCAGAATCCAGACCATGA 


GGAAGAATCAAATAGACAAT 


CATAGGTATGTTCAGAAATGTGA 


AAGGGGAAGGGGCTTTCAAAGCT 


CAAAAGCTTATGATGCTCAAACC 


iTAGGTTCACCTCrCATTTTCTTCAG 


CGGACCAGAGTGTTCCATGG 


AGGGGAGAATAAATAAAATCTGTGG 


TGGATTCACTGACTCAGCCTGC 


TCCCCAATGACGGTGATG 


GGTCCCACGAATTTGCTG 


TCGATCTCCTGACCTTGTGA 


CCTTGGATAGATTCAGCTCCC 


TTAAGCAAGGTTTTAATTAAGCTGC 


GGTTTTAATTAAGCTGCATGGC 


AGTTTGACACTGAGGCTTTG 


TGCAGATGCCTAAGAACTTTTCAG 


TCGAGGTTATGGTGAGCC 


GCTGATCTGAAGCCAATGA 


CTAATATAATCCTGGGCACATGG 


GATAAAAACAAGCACTGGCTCC 


TGTAATGTAAGCCCCATGAGG 


GTGAGTTCAAGCATAGTAATTATCC 


GATCCAGTGGAGACTCAGAG 


ATTCCTGAGTGTCTACCCTGTTGAG 


TCGAGAAGGACAAAATCACC 


a 


D17S946 


D17S1181 


D17S2026 


D17S838 
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D17S1818 


D17S614 
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[D17S1851 
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L18197I 
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I PGR size (bp) 
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00 
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VO 
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<s 
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o 

CN 
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CO 






OO 
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CN 
O 


reverse 


lAGTCAGCTGAGATTGTGCC 1 


CCCCACACACAGCTCATATG 


GCAACAGAGGGAGACTCCAA 


TATGGAGTACCTACTCTATGCCAGG 


AGCTGTGACAAATGCCTGTA 


AAGATGGGGCAGGAATGG 


AAACACCACACTCTCCCCTG 


GGGGCAGACGACTTCTCCTT 


ATTCACCTAATGAGGATTCTTCTTT 


1 
1 

8 


ITGACGTGCTATTTCCTGTTTTGTCT 


ICAACACACTACCCCAGGA 


GAGTCCGCTACCTGAGTGCT 


forward 


jGTTCTTTCCTCTTGTGGGG 


CAAGCCAAGACATCCCAGTT 


TTTTCTCTCTCATTCCATTGGG 


TCCCATCCCGTAAGACCTC 


ATTCAAAGCTGGATCCCTTT 


1 

1 
1 

H 


TCACTGTCCTCCAAGCCAG 


TTCTTGGGCTTCCCGTAGCC 


GGGGATACAACCTTTAAAGTTCC 


GCTGAAATAGCCATCTTGAGCTAC 


I CTTTCACTCTTTCAGCTGAAGAGG 


GTTTGTTGCTATGCCTGC 


ACTCCTCATCTGTAGGGTCT 


ID 


I D17S964 | 


L9S1091 


L7S1179 


10S2160 
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L7S2038 


17S2091 
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